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We develop and demonstrate an acceleration of the Liu & Tegmark quadratic estimator formalism 
for inverse variance foreground subtraction and power spectrum estimation in 21 cm tomography 
from 0(N 3 ) to 0(N log AT), where N is the number of voxels of data. This technique makes 
feasible the megavoxel scale analysis necessary for current and upcoming radio interferometers by 
making only moderately restrictive assumptions about foreground models and survey geometry. 
We exploit iterative and Monte Carlo techniques and the symmetries of the foreground covariance 
matrices to quickly estimate the 21 cm brightness temperature power spectrum, P(k\\,k±), the 
Fisher information matrix, the error bars, the window functions, and the bias. We also extend the 
Liu & Tegmark foreground model to include bright point sources with known positions in a way 
that scales as 0[(iVlog iV) x (N point sources)] < 0(N 5 ^ 3 ). As a first application of our method, we 
forecast error bars and window functions for the upcoming 128-tile deployment of the Murchinson 
Widefield Array, showing that 1000 hours of observation should prove sufficiently sensitive to detect 
the power spectrum signal from the Epoch of Reionization. 

PACS numbers: 95.75.-z, 95.75.Pq, 98.80.-k, 98.80.Es 



I. INTRODUCTION 

Neutral hydrogen tomography with the 21 cm line 
promises to shed light on vast and unexplored epoch of 
the early universe. As a cosmological probe, it offers 
the opportunity to directly learn about the evolution of 
structure in our universe during the cosmological dark 
ages and the subsequent Epoch of Reionization (EoR) 
P~H3]. More importantly, the huge volume of space and 
wide range of cosmological scales probed makes 21 cm to- 
mography uniquely suited for precise statistical determi- 
nation of the parameters that govern modern cosmologi- 
cal and astrophysical models for how our universe transi- 
tioned from hot and smooth to cool and clumpy [SHU] ■ It 
has the potential to surpass even the Cosmic Microwave 
Background (CMB) in its sensitivity as a cosmological 
probe [2D] . 

The central idea behind 21 cm tomography is that im- 
ages produced by low frequency radio interferometers at 
different frequencies can create a series of images at dif- 
ferent redshifts, forming a three dimensional map of the 
21 cm brightness temperature. Yet we expect that our 
images will be dominated by synchrotron emission from 
our galaxies and others. In fact, we expect those fore- 
ground signals to dominate over the elusive cosmological 
signal by about four orders of magnitude [22] [23] . 

One major challenge for 21 cm cosmology is the ex- 
traction of the brightness temperature power spectrum, 
a key prediction of theoretical models of the dark ages 
and the EoR, out from underneath a mountain of fore- 
grounds and instrumental noise. Liu & Tegmark ( |24j . 
hereafter "LT" ) presented a method for power spectrum 
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estimation that has many advantages over previous ap- 
proaches (on which we will elaborate in Section [TTJ) . It 
has, however, one unfortunate drawback: it is very slow. 
The LT method relies on multiplying and inverting very 
large matrices, operations that scale as 0(N 3 ), where N 
is the number of voxels of data to analyze. 

The goal of the present paper is to develop and demon- 
strate a way of achieving the results of the LT method 
that scales only as O(NlogN). Along the way, we will 
also show how LT can be extended to take advantage of 
additional information about the brightest point sources 
in the map while maintaining a reasonable algorithmic 
scaling with N. Current generation interferometers, in- 
cluding the Low Frequency Array (LOFAR, 25 ), the 
Giant Metrewave Radio Telescope (GMRT, [26 ), the 
Murchinson Widefield Array (MWA, [27]), and the Pre- 
cision Array for Probing the Epoch of Reionization (PA- 
PER, [2"8] ) are already producing massive data sets at 
or near the megavoxel scale (e.g. [H]). These data sets 
are simply too large to be tackled by the LT method. 
We expect next generation observational efforts, like the 
Hydrogen Epoch of Reionization Array [301, a massive 
Omniscope [3T], or the Square Kilometer Array [35], to 
produce even larger volumes of data. Moreover, as com- 
puter processing speed continues to grow exponentially, 
the ability to observe with increasingly hne frequency res- 
olution will enable the investigation of the higher Fourier 
modes of the power spectrum at the cost of yet larger 
data sets. The need for an acceleration of the LT method 
is pressing and becoming more urgent. 

Our paper has a similar objective to |33j . which also 
seeks to speed up algorithms for power spectrum esti- 
mation with iterative and Monte Carlo techniques. The 
major differences between the paper arise from the our 
specialization to the problem of 21 cm cosmology and 
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the added complications presented by foregrounds, espe- 
cially with regard to the basis in which various covari- 
ance matrices are easiest to manipulate. Our paper also 
shares similarities to [33]. Like [33], [33] does not ex- 
tend its analysis to include foregrounds. It differs also 
from this paper in spirit because that it seeks to go from 
interferometric visibilities to a power spectrum within a 
Bayesian framework rather than from a map to a power 
spectrum and because it considers one frequency channel 
at a time. In this paper, we take advantage of many fre- 
quency channels simultaneously in order to address the 
problem of foregrounds. 

This paper is organized as follows. We begin with Sec- 
tion [Tl] wherein we review the motivation for and details 
of the LT method. In Section [TTT] we present the novel as- 
pects of our technique for measuring the 21 cm brightness 
temperature power spectrum. We discuss the extension 
of the method to bright point sources and the assump- 
tions we must make to accelerate our analysis. In Section 
IV we demonstrate end-to-end tests of the algorithm and 
show some of its first predictions for the ability of the 
upcoming 128-tile deployment of the MWA to detect the 
statistical signal of the Epoch of Reionization. 



II. THE BRUTE FORCE METHOD 

The solution to the problem of power spectrum esti- 
mation in the presence of foregrounds put forward by LT 
offers a number of improvements over previous propos- 
als that rely primarily on line of sight foreground infor- 
mation [35 -43 . The problem of 21 cm power spectrum 
estimation shares essential qualities with both CMB and 
galaxy survey power spectrum estimation efforts. Like 
with galaxy surveys, we are interested in measuring a 
three dimensional power spectrum. On the other hand, 
our noise and foreground contaminants bear more sim- 
ilarity to the problems faced by CMB studies — though 
the foregrounds we face are orders of magnitude larger. 

The LT method therefore builds on the literature of 
both CMB and galaxy surveys, providing a unified frame- 
work for the treatment of geometric and foreground ef- 
fects by employing the quadratic estimator formalism for 
inverse variance foreground subtraction and power spec- 
trum estimation. In Section [II C| we will review precisely 
how it is implemented. 

The LT formalism has a number of important advan- 
tages over its predecessors. By treating foregrounds as 
a form of correlated noise, both foregrounds and noise 
can be downweighted in a way that is unbiased and loss- 
less in the sense that it maintains all the cosmological 
information in the data. Furthermore, the method al- 
lows for the simultaneous estimation of both the errors 
on power spectrum estimates and the window functions 
or "horizontal" error bars. 

Unfortunately, the LT method suffers from computa- 
tional difficulties. Because it involves inverting and mul- 
tiplying very large matrices, it cannot be accomplished 



faster than in 0(N 3 ) steps, where N is the number of 
voxels in the data to be analyzed. This makes analyzing 
large data sets with this method infeasible. The primary 
goal of this paper is to demonstrate an adaptation of 
the method that can be run much faster. But first, we 
need to review the essential elements of the method to 
put our adaptations and improvements into proper con- 
text. In Sections |II A| and |II B[ we describe our conven- 
tions and notation and explain the relationship between 
the measured quantities and those we seek to estimate. 
In Section |II C[ we review the LT statistical estimators 
and how the Fisher information matrix is used to calcu- 
late statistical errors on power spectrum measurements. 
Then in |IID| we explain the LT model of noise and fore- 
grounds in order to motivate and justify our refinements 



that will greatly speed up the algorithm in Section III 



A. Data Organization and Conventions 

We begin with a grid of data that represents the bright- 
ness temperatures at different positions on the sky as a 
function of frequency from which we wish to estimate 
the 21 cm brightness temperature power spectrum. We 
summarize that information using a data vector x which 
can be thought of as a one dimensional object of length 
n x n y n z = iVrlthe number of voxels in the data cube. 

Although the LT technique works for arbitrary sur- 
vey geometries, we restrict ourselves to the simpler case 
of a data "cube" that corresponds to a relatively small 
rectilinear section of our universe of size £ x x £ y x £ z in 
comoving coordinates]^] We pick our box to be a subset 
of the total 21 cm brightness temperature 3D map that 
a large interferometric observatory would produce. Un- 
like the LT method, our fast method requires that the 
range of positions on the sky must be small enough for 
the flat sky approximation to hold (Figure [TJ . Similarly, 
our range of frequencies (and thus redshifts) in the data 
cube must correspond to an epoch short enough so that 
P(fc, z) might be approximated as constant in time. Fol- 
lowing simulations by [33]) [20] argued that we can con- 
servatively extend the redshift ranges of our data cubes 
to about Az < 0.5. At typical EoR redshifts, such a 
small range in Az allows a very nearly linear mapping 
between the frequencies measured by an interferometer 
to a regularly spaced range of comoving distances, dc{z), 
although in general dc(z) is not a linear function of z or 
v. This also justifies the approximation that our data 
cube corresponds to an evenly partitioned volume of our 
universe. 



1 While it is helpful to think of x as a vector in the matrix opera- 
tions below, it is important to remember that the index i in Xi, 
which refers to the different components of x, actually runs over 
different values of the spatial coordinates x, y, and z. 

2 This restriction and its attendant approximations lie at the heart 
of our strategy for speeding up these calculations, as we explain 
in Section II II I 
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where x(k) is the Fourier transformed brightness temper- 
ature field and where angle brackets denote the ensemble 
average of all possible universes obeying the same statis- 
tics. 

Our choice of pixelization determines the relationship 
between the continuous power spectrum, -P(k), and the 
21 cm signal covariance matrix, which we call S for 
Signal. It is fairly straightforward to show, given Equa- 
tion[l]and the definition of the power spectrum |45j . that: 



Si 



[XiXj 



(Xi)(Xj 



^(k)V4(k)P(k) 



where ipi(k) is the Fourier transform of ipi(v): 



(4) 



(5) 



Separating this integral into each of the three Cartesian 
coordinates and integrating yields 



FIG. 1. This exaggerated schematic illustrates the flat sky ap- 
proximation. It shows great circles (colored and dashed) ap- 
proximated linearly in the region considered, with lines trac- 
ing back to the observer treated as if they were parallel. Our 
data cube contains the measured brightness temperatures for 
every small voxel. 



If the measured brightness temperatures, Xi, were only 
the result of redshifted 21 cm radiation, then each mea- 
surement would represent the average value in some small 
box of volume AxAyAz centered on i\ of a continuous 
brightness temperature field a;(r) [45] : 



ijji(r)x(r)d 3 r, 



(1) 



where our discretization function ipi is defined as ipi(r) 
V>o(r- Ti), where 



n(^)n(^)n( £ ) 

AxAyAz 



(2) 



and where H(x) is the normalized, symmetric boxcar 



function (II(a;) 



1 if 



< \ and otherwise). This 



choice of pixelization encapsulates the idea that each 
measured brightness temperature is the average over a 
continuous temperature field inside each voxel. In this 
paper, we improve on the LT method by including the 
effect of finite pixelation. This will manifest itself as an 
extra ^(k) term that will we define in Equation [6] and 
that will reappear throughout this paper. 
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r; $(k), where 
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(6) 



where jo(x) — sinx/x is the zeroth spherical Bessel func- 
tion. Because we can only make a finite number of mea- 
surements of the power spectrum, we parametrize and 
discretize P(k) by approximating it as a piecewise con- 
stant function: 



P(k) » 5> Q x a (k), 



(7) 



where the "band power" p a gives the power in region 
a of Fourier spacej^j specified by the characteristic func- 
tion x° (k) which equals 1 inside the region and vanishes 
elsewhere. 

Combining Equations [4] and [7] we can write down Sy : 



Su = *^2p a Q?j, where 



rfik 



(8) 



We choose these x a (k) to produce band powers that re- 
flect the symmetries of the observation. Our universe is 
isotropic in three dimensions, but due to redshift space 
distortions, foregrounds, and other effects, our measure- 
ments will be isotropic only perpendicular to the line of 
sight [201 . This suggests cylindrical binning of the 
power spectrum; in the directions perpendicular to the 



B. The Discretized 21 cm Power Spectrum 



Ultimately, the goal of this paper is to estimate the 21 
cm power spectrum -P(k), defined via 



(F(k)x(k')) = (2^) 3 £(k-k')P(k), 



(3) 



3 In contrast to lowered Latin indices, which we use to pick out 
voxels in a real space or Fourier space data cube, we will use 
raised Greek indices to pick out power spectrum bins, which will 
generally each run a range in kn and in k±. 
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line of sight, we bin k x and k y together radially to get 
a region in k-space extending from from kj_ — Afcj_/2 to 
fe" + Ak±/2 where k\ = k^ + ky. Likewise, in the direc- 
tion parallel to the line of sight, we integrate over a region 
of k-space both from k^ — Afcy /2 to fcjj* + Afcy /2 and, be- 
cause the power spectrum only depends on fcii = \k z \, 
from -fcff + Afcn/2 to -fefi" - Afcn/2. Therefore, we have 



)Q . = 1 
'« (2tt) 3 

Jkf -Afcj_/2 



fe^+Afe||/2 /•- fc|-Afe||/2" 
fc? — Afc||/2 v/-fc°+Afej|/2 

|$(k)|V k -( r4 - r ^fc x <20dfc ± dfcn. 



(9) 



Without the factor of |$(k)| 2 , the LT method was able 
to evaluate this integral analytically. With it, the inte- 
gral must be evaluated numerically if it is to be evaluated 
at all. This is of no consequence; we will return to this 
formula in Section [lII B| to show how the matrix Q Q natu- 
rally lends itself to approximate multiplication by vectors 
using fast Fourier techniques. 



C. 21 cm Power Spectrum Statistics 

In order to interpret the data from any experiment, 
we need to be able to estimate both the 21 cm brightness 
temperature power spectrum and the correlated errors 
induced by the survey parameters, the instrument, and 
the foregrounds. The LT method does both at the same 
time; with it, the calculation of the error bars immedi- 
ately enables power spectrum estimation. 



1. Inverse Variance Weighted Power Spectrum Estimation 

The LT method adapts the inverse variance weighted 
quadratic estimator formalism [501 [5T] for calculating 21 
cm power spectrum statistics. The first step towards 
constructing the estimator jP for p a is to compute a 
quadratic quantity, called g° whose relationship^] to jP 
we will explain shortly: 

$ Q ^^(x-(x» T C- 1 Q«C- 1 (x-(x)). (10) 

Here C is the covariance matrix of x, so 

C^(xx T }-(x)(x) T . (11) 

For any given value of a, the right-hand side of Equa- 
tion ( 10 1 yields a scalar. Were both our signal and fore- 



sense that it preserves all the cosmological information 
contained in the data. Of course, with a non-Gaussian 
signal, the power spectrum cannot contain all of the in- 
formation, though it still can be very useful |50) . 

Our interest in the quadratic estimators cP lies in their 
simple relationship to the underlying band powers. In 
1501. it is shown that: 



Fp + b 



(12) 



where each b a is the bias in the estimator and F is the 
Fisher information matrix, which is related to the proba- 
bility of having measured our data given a particular set 
of band powers, /(x|p Q ). The matrix is defined [52] as: 



pa/3 



d 2 ln/(x|p Q ) 
dp a dpP 



(13) 



The LT method employs the estimators by calculating 
both F and b using relationships derived in 50J : 

F af3 = itr[C- 1 Q°C- 1 Q^] and (14) 

b a = -tr[(C - S)C- 1 Q"C- 1 ]. (15) 

We want our jP to be unbiased estimators of the true 
underlying band powers, which means that we will have 
to take care to remove the biases for each band power, 
b a . We construct our estimator^] as linear combinations 
of the quadratic estimators (P that have been corrected 
for bias: 



P = M(q b), 



(16) 



where M is a matrix which correctly normalizes the 
power spectrum estimates; the form of M represents a 
choice in the trade-off between small error bars and nar- 
row window functions, as we will explain shortly. 

How do we expect this estimator to behave statisti- 
cally? The only random variable on the right hand side 
of Equation 16 is q, so we can combine Equations 12 and 



16 to see that our choice of p indeed removes the bias 
term: 



(p) = MFp + Mb Mb = MFp = Wp. 



(17) 



We have defined the matrix of "window functions" W = 
MF because Equation [17] tells us that we can expect 
our band power spectrum estimator, p, be be a weighted 
average of the true, underlying band powers, p. That 
definition imposes the condition on W that 



grounds Gaussian, this estimator would be optimal in the 







i 



(18) 



4 Unlike the notation in LT, we do not include the bias term in cp 
but will later include it in our power spectrum estimator. The 
result is the same. 



5 Here were differ slightly from the LT method in the normaliza- 
tion, which does not have the property from Equation |18| We 



instead follow I53| . 
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which is equivalent to the statement that the weights in 
a weighted average must add up to one. The condition 
on W constrains our choice of M, though as long as M 
is an invertible matrix]^] the choice of M does not change 
the information content of our power spectrum estimate, 
only the way we choose to represent our result. 



2. Window Functions and Error Bars 

In this paper, we choose a form of p where M oc F -1 / 2 . 
Two other choices for M are presented in [53] : one where 
M oc I and another where M oc F _1 . The former pro- 
duces the smallest possible error bars, but at the cost of 
wide window functions and correlated measurement er- 
rors. The latter produces <5-function windows, but large 
and anticorrelated measurement errors. This choice of 
M oc F" 1 / 2 has proven to be a happy medium be- 
tween those other two choices for M. It produces rea- 
sonably narrow window functions and reasonably small 
error bars which have the added advantage of being com- 
pletely uncorrelated, so that each measurement contains 
a statistically independent piece of information. Because 
W = MF and because of the condition on W in Equa- 
tion [181 there is only one such M: 



Ct0 



M 



a/3 _ 



(p-1/2 



(19) 



With this choice of M we get window functions of the 
form 



(pl/2)a/3 



(20) 



which we can use to put "horizontal error bars" on our 
power spectrum estimates. 



Using Equation 16 and the fact derived in [50] that an 
equivalent formula for F is given by 



F=(qq T )-(q)(q) T , 



(21) 



we can see that the covariance of p takes on a simple 
form: 



(pp T )-(p)(p) T = MFM T . 



(22) 



This allows us to write down the "vertical error bars" on 
our individual power spectrum estimates: 



Ap° = [(MFM T )' 



1/2 



1 



E-y( Fl/2 )"~ 



(23) 



6 None of the choices of M involve anything more computationally 
intensive than inverting F. This is fine, since F is a much smaller 
matrix than C. 



As in LT, we can transform our power spectrum estimates 
and our vertical error bars into temperature units: 



rpo 



{klfk a 
2tt 2 



and likewise, 



AT° 



1 1/2 



(24) 



2tt 2 [E^F^y 



1/2 



(25) 



This makes it easier to compare to theoretical predic- 
tions, which are often quoted in units of K or mK. 



D. Foreground and Noise Models 

The structure of the matrix C that goes into our in- 
verse variance weighted estimator depends on the way 
we model our foregrounds, noise, and signal. We assume 
that those contributions are the sum of five uncorrelated 
components: 



c £ components 



;x c xj) - (x c )( Xc ) T 



= S + R + U + G + N. 



(26) 



These are the covariance matrices due to 21 cm 
Signal, bright point sources Resolved from one another, 
Unresolved point sources, the Galactic synchrotron, and 
detector Noise, respectively. This deconstruction of C is 
both physically motivated and will ultimately let us ap- 
proximate C _1 (x— (x)) much more quickly than by just 
inverting the matrix. 

Following LT, we neglect the small cosmological S be- 
cause it is only important for taking cosmic variance into 
account. It is straightforward to include the S matrix 
in our method, especially because we expect it to have a 
very simple form, but this will only be necessary once the 
experimental field moves from upper limits to detection 
and characterization of the 21 cm brightness temperature 
power spectrum. 

In this paper, we will develop an accelerated version of 
the LT method using the models delineated in LT. That 
speed-up relies on the fact that all of these covariance 
matrices can be multiplied by vectors O(NlogN) time. 
However, our techniques for acceleration will work on a 
large class of models for C as long as certain assumptions 
about translation invariance and spectral structure are 
respected. In this section, we review the three contami- 
nant matrices from LT: U, G, and N. When we discuss 
methods to incorporate these matrices into a faster tech- 
nique in Section [lIID[ we will also expand the discussion 
of foregrounds to include R, which is a natural extension 
of U. 
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Resolved 
Point Sources 



Unresolved Galactic 
Point Sources Synchrotron 






Detector Noise 



FIG. 2. These example data cubes (with the line of sight drawn vertically) illustrate the strong or weak correlations between 
different voxels in the same cube. In Section |III F| we explain how these simulated data cubes are generated quickly. The 
addition of resolved point sources, which is not included in LT, is discussed in Section [ill D 1| To best exemplify the detailed 
structure of the models, the color scales are different for each of the cubes. 



1. Unresolved Point Sources 



we get a covariance parallel to the line of sight of 



For a typical current generation or near future exper- 
iment, the pixels perpendicular to the line of sight are 
so large that every one is virtually guaranteed to have 
a point source in it bright enough to be an important 
foreground to our 21 cm signal. These confusion limited 
point sources are taken into account using their strong 
correlations parallel to the line of sight and weaker cor- 
relations perpendicular to the line of sight, both of which 
are easily discerned in Figure [2] 

Following LT we split U into the tensor product of two 
parts, one representing correlations perpendicular to the 
line of sight and the other parallel to the line of sight: 



U = U, 



U,i 



(27) 



Covariance perpendicular to the line of sight is modeled 
as an unnormalized Gaussian: 



(U±,)ij = cxp 



((r±),-(r^r 
2a\ 



(28) 



where a± represents the correlation length perpendicu- 
lar to the line of sight. Following LT, we take this to 
be a comoving distance corresponding to 7 arcminutes, 
representing the weak clustering of point sources. 

The covariance along the line of sight assumes a Pois- 
son distributed number of point sources below some flux 
cut, Scut, which we take to be 0.1 Jy, each with a spectral 
index drawn from a Gaussian distribution with mean R 
and standard deviation <j K . Given a differential source 
count [531 of 



dn 
dS 



=(4000 Jy-V-^x 



( s ) 

\ 0.880 Jy J 



-2.51 



1( 



0.880 Jy 



-1.75 



for S > 
for S < 



Jy 
Jy, 



(29) 



(U^ =(1.4 xlO" 3 K) 2 (^r 2 -"(^) 



exp 



f (Mw&)) s 



h{S c 



(30) 



where we have assumed a power law spectrum for the 
point sources where rji = Vi/v*, v* = 150 MHz, and k 
and a K are the average value and standard deviation of 
the distribution of spectral indices of the point sources. 
We define l2{S cut ) as 



l2{S C ut) = 



cut rln 



(31) 



Following LT, we take 7t = 0.5 and a K ~ 0.5, both of 
which are consistent with the results of [29]. In Section 
|IIID 2[ we will return to Equation [30] and show how it 
can be put into an approximate form that can be quickly 
multiplied by a vector. 



2. Galactic Synchrotron Radiation 

Following LT, we model Galactic synchrotron emission 
in the same way that we model unresolved point sources. 
Fundamentally, both are spatially correlated synchrotron 
signals contributing to the brightness temperature of ev- 
ery pixel in our data cube. However the galactic syn- 
chrotron is much more highly correlated spatially, which 
can be clearly seen in the sample data cube in Figure 
[2] This leads to our adoption of a much larger value of 
a± ; we take a± to be a comoving distance corresponding 
to 30° on the sky. Following LT, we take 7t = 0.8 and 
<r K = 0.4. 

This is an admittedly crude model for the galactic syn- 
chrotron, in part because it fails to take into account the 
roughly planar spatial distribution of the Galactic syn- 
chrotron. A more sophisticated model for G that in- 
corporates a more informative map of the Galactic syn- 
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chrotron can only produce smaller error bars and nar- 
rower window functions. However, such a model might 
involve breaking the assumption of the translational in- 
variance of correlations, which could be problematic for 



the technique we use in Section III to speed up this algo- 
rithm. In practice, we expect very little benefit from an 
improved spatial model of the Galactic synchrotron due 
to the restriction imposed by the flat sky approximation 
that our map encompass a relatively small solid angle. 



to store and computationally infeasible to invert. How- 
ever, we need to be able to multiply by and often invert 
these large matrices to calculate our quadratic estimators 
(Equations [9| and [l0| , the Fisher matrix (Equation 14 1, 
and the bias (Equation 15 1. A 10 6 voxel data cube, for 
example, would take 0(1(F 8 ) computational steps to an- 
alyze. This is simply infeasible for next-generation radio 
interferometers and we have therefore endeavored to find 
a faster way to compute 21 power spectrum statistics. 



Instrumental Noise 



III. OUR FAST METHOD 



Here we diverge from LT to adopt a form of the noise 
power spectrum from [55] that is more readily adaptable 
to the pixelization scheme we will introduce: 



P iV (k,A) 



B- 2 (k,A). 



(32) 



Here T sys is the system temperature (which is sky noise 
dominated in our case), A is the total effective collect- 
ing area of the array, and r is the is the total observing 
time. B(k, A) is a function representing the w-coverage, 
normalized to peak at unity, which changes with wave- 
length. Lastly, y is the conversion from bandwidth to 
the comoving length of the box parallel to the line of 
sight and c?m is the transverse comoving distanc^J so 
yd^Qpi^Av — AxAyAz with f2 p i x being the angular size 
of our pixels and Av being the frequency channel width. 
This form of the noise power spectrum assumes that the 
entire map is observed for the same time r, which is why 
the ratio of the angular size of the map to the field of 
view does not appear. 

We use Equation [4] to discretize the power spectrum 
and get N: 



?k-r, — ik-r. 

e l e 3 



|$(k)|'P"(k,A) 



d 3 k 
(2?r) 3 



(33) 



Instead of evaluating this integral, we will show in Section 
|IIID 3| that it can be approximated using the discrete 
Fourier transform. 



E. Computational Challenges to the Brute Force 
Method 



For a large data cube, the LT method requires the 
application of large matrices that are memory-intensive 



To avoid the computational challenges of the LT 
method, we seek to exploit symmetries and simpler forms 
in certain bases of the various matrices out of which we 
construct our estimate of the 21 cm power spectrum and 
its attendant errors and window functions. In this sec- 
tion, we describe the mathematical and computational 
techniques we employ to create a fast and scalable algo- 
rithm. 

Our fast method combines the following six separate 
ideas: 

1. A Monte Carlo technique for computing the Fisher 



information matrix and the bias (Section III A) 



2. An FFT-based technique for computing band pow- 
ers using the Q" matrices (Section III B I . 



3. An application of the conjugate gradient method 
that eliminates the need to invert C (Section III C I . 



A Toeplitz matrix technique for multiplying vectors 
quickly by the constituent matrices of C (Section 

Wd\. 



5. A combined FFT and spectral technique for precon- 
ditioning C to improve converge of the conjugate 
gradient method (Section HIE I 



6. A technique using spectral decomposition and 
Toeplitz matrices for rapid simulation of data cubes 
for our Monte Carlo (Section IIIF). 



In this Section, we explain how all six are realized and 
how they fit into our fast method for power spectrum 
estimation. Finally, in Section [ill G| we verify the algo- 
rithm in an end-to-end test. 



Monte Carlo Calculation of the Fisher 
Information Matrix 



7 The transverse comoving distance, cLm{z), is the ratio of an ob- 
ject's comoving size the angle it subtends, as opposed to the 
angular diameter distance, d^{z), which is the ratio of its phys- 
ical size to the angle it subtends. It is sometimes called the 
"comoving angular diameter distance" and it is even sometimes 
written as d^z). See [56] for a helpful summary of these often 
confusingly named quantities. 



In order to turn the results of our quadratic estimator 
into estimates of the power spectrum with proper vertical 
and horizontal error bars, we need to be able to calculate 
the Fisher information matrix and the bias term. Instead 
of using the form of F in Equation[l4]that the LT method 
employs, we take advantage of the relationship between 



F and q in Equation |2lJ that F = (qq T > - (q)(q) T . If 
we can generate a large number of simulated data sets 
x drawn from the same covariance C and then compute 
q from each one, then we can iteratively approximate F 
with a Monte Carlo. In other words, a solution to the 
problem of quickly calculating q also provides us with a 
way to estimate F. What's more, the solution is trivially 
parallelizable; creating artificial data cubes and analyz- 
ing them can be done by many CPUs simultaneously. 

In calculating F, we can get b a out essentially for free. 
If we take the average of all our q vectors, we expect to 
that 



1 



= ^-(x-(x))'C- 1 Q«C- 1 (x-(x)) 

= tr [((x-(x))(x-(x)) T ) C^CTC- 1 ] 

= tr [CTCT 1 ] = b a (34) 



in the limit where S is negligibly small. This implies that 
p can be written in an even simpler way: 



— < t 



1 



(35) 



where, recall, F is calculated as the sample covariance of 
our q vectors. We therefore can calculate all the compo- 
nents of our power spectrum estimate and its error bars 
using a Monte Carlo. 

In Section fill Gl we will return to assess how well the 
Monte Carlo technique works and its convergence proper- 
ties. But first, we need to tackle the three impediments 
to computing in Equation 10 quickly: generating a 
random x drawn from C, computing C _1 (x — (x)), and 
applying Q". 



B. Fast Power Spectrum Estimation Without 
Noise or Foregrounds 

If we make the definition that 

y^C- 1 (x-(x» (36) 
to simplify Equation [H)| to 

q a = y T Q Q y, (37) 

we can see that even if we have managed to calculate y 
quickly, we still need to multiply it by a N x N element 
Q a matrix for each band power a. Though each Q Q re- 
spects translation invariance that could make multiply- 
ing by vectors faster, there exists an even faster technique 
that can calculate every entry of p simultaneously using 
fast Fourier transforms. 

To see that this is the case, we substitute Equation [9] 
into Equation 37 reversing the order of summation and 



integration and factoring the integrand: 



0° 



1 


" /•fcj j , +Afc | |/2 


f~ k ° ~ Afc ll/ 2 


2 


J A:° — A.fe|| /2 


/-fc°+Afc||/2 



fc°+Afc i /2 
$(k) 



E 



Vie 



3 



2 k±d6dk±dk\\ 



(2tt) 3 



(38) 



The two sums inside the integral are very nearly discrete, 
3D Fourier transforms. All that remains is to discretize 
the Fourier space conjugate variable k as we have already 
discretized the real space variable r. 

In order to evaluate the outer integrals, we approxi- 
mate them as a sum over grid points in Fourier space. 
The most natural choice for discretization in k is one 
that follows naturally from the FFT of y in real space. 
If our box is of size £ x £ y £ z arL d broken into n x x n y x n z 
voxel^f]we have that 

3x&x jyV-y jz^z j ,^gs 

where j x ,j y ,j z G j f-, ...,0, — ^ - lj . 

The natural 3D Fourier space discretization is 



2irm x 2irm v 2Trm z 



f n x 

where m x , m y , m z € | 



y.z 



...,0, 



with a Fourier space voxel volume 



2tt 2tt 2tt 
(Ak) = — x — x — . 

K/t. 



(40) 

l x,y,z _ ^\ 

2 ) 

(41) 



With this choice of discretization, we will simplify our 
integrals by sampling Fourier space with delta functions, 
applying the approximation in the integrand of Equation 
El that 



y, (2^) 3 J 3 (k-k m ) 



in 



9 9 9 



(42) 



This simplifies Equation |38| considerably: 
1 I x- 



5°= 2^\^ Vit 



3 



X a (k m )|$(k m )| 2 /(4V Z 



(43) 



8 For simplicity and consistency we assume that n x , n y , and n z 
are all even and we take the origin the to be the second of the 
two center bins. 



9 



•""■TV 



If we define y m = Ui e m ' Tj , then we can write q as: 
^^&X a (k m )|$(k m )| 2 

y m 

= WTT E ly™| 2 X Q (k m )|<f (k m )| 2 . (44) 

a m 

This result makes a lot of sense: after all, the power 
spectrum is — very roughly speaking — the data Fourier 
transformed, squared, and binned with an appropriate 
convolution kernel. 

This is a very quick way to calculate q because we 
can compute y in 0(N log N) time (if we already have 
y) and then we simply need to add \y m \ 2 for every to, 

2 
2 



weighted by the value of the analytic function |$(k„ 
to the appropriate band power aFj Each value of \y. 
gets mapped uniquely to one value of a, so there are 
only N steps involved in performing the binning. Unlike 
in the LT method, we perform the calculation of g° for 
all values of a simultaneously. 

However, the FFT approximation to Qfj from Equa- 



tion 



42 



does not work very well at large values of (r, — Yj ) 
because the discrete version of Q Q does not sample the 
continuous version of Q" very finely. This can be im- 
proved by zero padding the input vector y^, embedding 
it inside of a data cube of all zeros a factor of ( 3 larger. 
For simplicity, we restrict £ to integer values where C = 1 
represents no zero padding. By increasing our box size, 
we decrease the step size in Fourier space and thus the 
distance between each grid point in Fourier space where 
we sample k with delta functions. Repeating the deriva- 
tion from Equations [39] through [44] yields: 



1 



J2\y m \ 2 x a (Kn)Mk m )\ 2 , (45) 



where y has been zero padded and then Fourier trans- 
formed. This technique of power spectrum estimation 
scales as 0(( 3 N\og N), which is fine as long as ( is 
small0 In Figure M we see how increasing £ from 1 to 5 
greatly improves accuracy. 



Inverse Variance Weighting with the Conjugate 
Gradient Method 



that we can also calculate y 



q" quickly provided 
= C- 1 (x - (x)) quickly. 



For simplicty, we choose band power spectrum bins with the same 
width as our Fourier space bins (before zero padding). This linear 
binning scheme makes plotting, which is typically logarithmic in 
the literature, more challenging. On the other hand, it better 
spreads out the number of Fourier space data cube bins assigned 
to each band power. 

Though not the computational bottleneck, this step is the most 
memory intensive; it involves writing down an array of C^N 
double-precision complex numbers. This can reach into the gi- 
gabytes for very large data cubes. 
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FIG. 3. We use an FFT-based technique to approximate the 
action of the matrix Q a that encodes the Fourier transform- 
ing, binning, and pixelization factors. In this Figure, we show 
how the approximation improves with different factors of the 
zero padding parameter, £, while varying a single coordinate 
of one of the Q a matrices. For a fairly small value of £, the 
approximation is quite good, meaning that the binning and 
Fourier transforming step contributes subdomninantly to the 
complexity of the overall algorithm. 



The latter turns out to be the most challenging part of 
the problem; we will address the various difficulties that 



it presents in this Section through Section [Til E| We take 
our inspiration for a solution from a similar problem that 
the WMAP team faced in making their maps. They em- 
ployed the preconditioned conjugate gradient method to 
great success [57] EH] ■ 

The conjugate gradient method [59] is an iterative 
technique for solving a system of linear equations such 
as Cy = (x — (x) ) . Although directly solving this system 
involves inverting the matrix C, the conjugate gradient 
method can approximate the solution to arbitrary pre- 
cision with only a limited number of multiplications of 
vectors by C. If we can figure out a way to quickly mul- 
tiply vectors by C by investigating the structure of its 
constituent matrices, then we can fairly quickly approx- 
imate y. We will not spell out the entire algorithm here 
but rather refer the reader to the helpful and comprehen- 
sive description of it in [60] . 

Whenever iterative algorithms are employed, it is im- 
portant to understand how quickly they converge and 
what their rates of convergence depend upon. If we are 
trying to achieve an error e on our approximation ycGM 
to y where 



|Cy C GM - (x - (x) 
|x-(x)| 

,1/2 



(46) 



and where |x| = (Xi t ^i)''' is the length of the vector 
x, then the number of iterations required to converge 
(ignoring the accumulation of round-off error) is bounded 
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by 



1 r-, ( 2 

n < — v re In - 
" 2 V U 



(47) 



where re is the condition number of the matrix (not to be 
confused with re used elsewhere as a spectral index), de- 
fined as the ratio of its largest eigenvalue to its smallest: 



re(C) 



Amax(C) 
Amin(C) 



(48) 



Because n only depends logarithmically on e, the con- 
vergence of the conjugate gradient method is exponen- 
tial. In order to make the algorithm converge in only a 
few iterations, it is necessary to ensure that re is not too 
large. This turns out to be a major hurdle that we must 
overcome, because we will routinely need to deal with 
covariance matrices with re(C) « 10 s or worse. This 
dynamic range problem is unavoidable; it comes directly 
from the ratio of the brightest foregrounds, typically hun- 
dreds of kelvin, to the noise and signal, typically tens of 
millikelvin. That factor, about 10 , enters squared into 
the covariance matrices, yield ing condition numbers of 
roughly 10 8 . In Section HIE we will explain the efforts 
we undertake to mitigate this problem. 



D. Foreground and Noise Covariance Matrices 

Before we can go about ensuring that the conjugate 
gradient method converges quickly, we must understand 
the detailed structure of the constituent matrices of C. 
In particular, we will show that these matrices can all 
be multiplied by vectors in 0(N log N) time. We will 
first examine the new kind of foreground we want to in- 
clude, resolved point sources, which will also provide a 
useful example for how the foreground covariances can 
be quickly multiplied by vectors. 



1. Resolved Point Sources 

Unlike LT, we do not assume that bright point sources 
have already been cleaned out of our map. Rather we 
wish to unify the framework for accounting for both re- 
solved and unresolved foregrounds by inverse covariance 
weighting. This will allow us to directly calculate how 
our uncertainties about the fluxes and spectral indices of 
these point sources affect our ability to measure the 21 
cm power spectrum. 

In contrast to the unresolved point sources modeled by 
U, we model Nr bright resolved point sources as having 
known position^] with different fluxes S n (at reference 
frequency z/* ) and spectral indices re„ , neither of which is 



known perfectly. We assume that resolved point source 
contributions to x are uncorrelated with each other, so we 
can define an individual covariance matrix R„ for each 
point source. This means that our complete model for R 
is: 



R = R„ . 



(49) 



Following LT, we can express the expected brightness 
temperature in a given voxel along the line of sight of 
the n th point source by a probability distribution for flux, 
Ps n (S')y an d spectral index, p Kn (re'), that are both Gaus- 
sians with means S n and re„ and standard deviations as n 
and <7 Kn , respectively. Following the derivation in LT, 
this yields: 



(xi) n = k in (1.4 x 10" 3 K) 

s- 



/>/.r 



1 Jy 

= S iin (lAx 10" 3 K) 



1 sr 

Ps„(S')dS' 



Vi 2 x 



'p Kn ( K ')dK' 



lJy 



1 sr 



l exp 



(50) 



where again rji = vi/v*. Here 8u n is a Kronecker delta 
that forces (xi) to be zero anywhere other than the line 
of sight corresponding to the n th resolved point source. 
Likewise, we can write down the second moment: 

(xiXj) = S u J jjn (lA x 10- 3 K) 2 (j7ir?j)- 2 x 




! ps n (S') d S>) x 



(ViVj) K p K „(re')dre' 



a«_*«,(1.4 x lO^K) 2 ^-) -2- "" 



's„ 



(i Jy) 2 

r 2 



1 sr 



exp 



5% of S n and a Kn 0.2. 



(51) 



where we assume as n 
We know that (x i ) i 
a vector because it is a rank 1 matrix. Therefore, in order 



can be quickly multiplied by 



If the data cube is not overresolved, this assumption should be 



pretty good. If a point source appears to fall in two or more 
neighboring pixels, it could be modeled as two independent point 
sources in this framework. An even better choice would be to 
include the correlations between the two pixels, which would be 
quite strong. Modeling those correlations could only improve the 
results, since it would represent including additional information 
about the foregrounds, though it might slow down the method 
slightly. Not accounting for position uncertainty will cause the 
method to underestimate the "wedge" feature |7I I61tl63) . 
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to show that all of R can be quickly multiplied, we recast 
(xiXj) n as the product of matrices that can be multiplied 
by a vector in 0(N log N) or faster. If we then ignore the 
constants and just look at the parts of this matrix that 
depend on coordinates, we have that: 



(xiXj) oc 5 ii J jjn {rii'q j y 



exp 



-yKhiWj) 2 



exp — 

$nMT 2 ~ Kn exp^Jln^) 2 ] 



2 V m 



(52) 



This matrix can be separated into the product of three 
matrices: one diagonal matrix that only depends on r]i, 
an inner matrix that includes the logarithm of a quotient 
of rji and rjj in the exponent, and another diagonal ma- 
trix that only depends on rjj . The diagonal matrices can 
be multiplied by a vector in 0(n z ). Moreover, because 
our cubes have redshift ranges Az < 0.5 the frequencies 
at i and j are never very far apart, we can make the 
approximation that: 



In 



Vj 



In 



V 3, 



isq + Avj 



1 

^0 



(Ai/i-Ai/,-) (53) 



where Ai^ = i/j — v and u is a constant reference fre- 
quency close to both v\ and Vj, We choose the center 
frequency of the data cube to be vq. We can see now 
by combining Equations [52] and [53] that the inner ma- 
trix in our decomposition of the second moment depends 
only on the magnitude of the difference between Vi and 
Vj. In the approximation that the physical size of the 
data cube is small enough that frequencies map linearly 
to distances, this shows that R„ respects translational 
invariance along the line of sight. 

Because the entries in this inner part of R„ only de- 
pend on differences in frequencies, the inner matrix is a 
diagonal-constant or "Toeplitz" matrix. Toeplitz matri- 
ces have the fortuitous property that they can be mul- 
tiplied by vectors in 0{N\ogN), as we explain in Ap- 
pendix [X] Therefore, we can multiply R„ by a vector in 
0(n z log n z ) and we can multiply R by a vector faster 
than O (N log N). 

We can understand this result intuitively as a conse- 
quence of the fact that the inner part of R„ is transla- 
tionally invariant along the line of sight. Matrices that 
are translationally invariant in real space are diagonal in 
Fourier space. That we need to utilize this trick involving 
circulant and Toeplitz matrices is a consequence of the 
fact that our data cube is neither infinite nor periodic. 



Unresolved Point Sources and the Galactic Synchrotron 
Radiation 



Let us now take what we learned in Section III D 1 (and 
AppendixjA]) to see if U can also be quickly multiplied by 
a vector. Looking back at Equation [28] we can see that 
our job is already half finished; (U±)ij only depends on 
the absolute differences between and {r±)j. Like- 

wise, we can perform the exact same trick we employed 



in Equations 52 and 53 to write down the relevant parts 
of (U\\)ij from Equation 30 with the approximation that 



Avi is always small relative to vq: 



{U\\)ij oc exp 



Av, 



(54) 



In fact, we can decompose U as a tensor product of 
three matrices sandwiched between two diagonal matri- 
ces: 



U = D U [U X ®U 3/ ®U Z ]D U . 



(55) 



where all three inner matrices are Toeplitz matrices. 
When we wish to multiply U by a vector, we simply pick 
out one dimension at a time and multiply every segment 
of the data by the appropriate Toeplitz matrix (e.g. ev- 
ery line of sight for \J Z ). All together, the three sets 
of multiplications can be done in O(n x n y nf\ogrif) + 
0(n x n f n y log n y ) + 0(n y n f n x log n x ) = O(NlogN) 
time. 

Moreover, since G has exactly the same form as U, al- 
beit with different parameters, G too can be multiplied 
by a vector in O(NlogN) time by making the same ap- 



proximation that we made in Equation 53 



3. Instrumental Noise 

Lastly, we return now to the form of N we introduced 
in Section II D 3| To derive a form we combine Equations 
[32] and [33 The details are presented in Appendix [B] so 
here we simply state the result: 



N = F^NFj,, 



(56) 



where Fj_ and ~F\ are the unitary discrete 2D Fourier 
and inverse Fourier transforms and where: 



N,„ 



\ 4 Ti ys f (k x ^Ax/2)f (k y ,Ay/2)S lr , 



^ant(^pix) 2r 



,Av 



(57) 



Here, A ari t is the effective area of a single antenna, Av 
is the frequency channel width, I and m are indices that 
index over both uw-cells and frequencies, and U is the to- 
tal observation time in a particular wii-cell at a particular 
frequency. 

Because this matrix is diagonal, we have therefore 
shown that N, along with R, U, and G, can be mul- 
tiplied by a vector in O (TV log N). We have summarized 
the results for all four matrices in Table [H 
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Covariance Matrix 


Parallel to the Line of Sight 


Perpendicular to the Line of Sight 


R: Resolved Point Sources 


Toeplitz symmetry 


Diagonal in real space 


U: Unresolved Point Sources 


Toeplitz symmetry 


Toeplitz symmetry 


G: Galactic Synchrotron Radiation 


Toeplitz symmetry 


Toeplitz symmetry 


N: Instrumental Noise 


Diagonal in real space 


Diagonal in Fourier space 



TABLE I. Due to the symmetries or approximate symmetries our models for the foreground and noise covariance matrices, 
they can all be multiplied by a vector in O(TVlogiV) time or faster. Summarized above are the reasons why each matrix can 
be quickly multiplied by. Either the matrices respect translation invariance and thus Toeplitz symmetry, their components 
are uncorrelated between lines of sight or frequencies, making them diagonal in real space, or they are uncorrelated and thus 
diagonal in Fourier space. In each case, the symmetries rely on the separabiltiy of the modeled covariance matrices into the 
tensor product of parts parallel or perpendicular to the line of sight. 



4- Eliminating Unobserved Modes with the Psuedo-Inverse 

In our expression for the noise covariance in Equation 
[57} we are faced with the possibility that t\ could be 
zero for some values of I, leading to infinite values of 
Nij. Fourier modes with i; = correspond to parts of 
the ui>-plane that are not observed by the instrument, 
i.e. to modes containing no cosmological information. 
We can completely remove these modes by means of the 
"psuedo-inverse" [IS], which replaces C _1 in the expres- 
sion C _1 (x — (x)) and optimally weights all observed 
modes (this removal can itself be thought of as an op- 
timal weighting — the optimal weight being zero). The 
psuedo-inverse involves II, a projection matrix (IP = II 
and II 2 = II) whose eigenvalues are for modes that 
we want to eliminate and 1 for all other modes. It can 
be shown [35] that the quantity we want to calculate for 
inverse variance weighting is not C _1 (x — (x)) but rather 
the quantity where: 



n[ncn + 7 (i n)] _1 n. 



(58) 



In this equation, 7 can actually be any number other than 
0. The term in brackets in the above equation replaces 
the eigenvalues of the contaminated modes of C with 7. 
The outer II matrices then project those modes out after 
inversion. In this paper, we take 7 — 1 as the convenient 
choice for the preconditioner we will develop in Section 

mm 

The ability to remove unobserved modes is also essen- 
tial for analyzing real data cubes produced by an interfer- 
ometer. Interferometers usually produce so-called "dirty 
maps," which are corrected for the effects of the primary 
beam but have been convolved by the synthesized beam, 
represented by the matrix B: 



-^dirty map 



Bx. 



(59) 



To compute x for our quadratic estimator, we need to 
invert B. Since the synthesized beam matrix is diagonal 
in Fourier space, this would be trivial were it not for 
unobserved baselines that make B uninvertable. This can 
be accomplished with the psuedoinverse as well, since the 
modes that would have been divided by when inverting 



B are precisely the modes that we will project out via 
the psuedoinverse. We can therefore comfortably take 



x = F^nrnBn 



where B = F^BFj_ and B is diagonal. 

The psuedo-inverse formalism can be usefully extended 
to any kind of mode we want to eliminate. One especially 
useful application would be to eliminate frequency chan- 
nels contaminated by radio frequency interference or ad- 
versely affected by aliasing or other instrumental issues. 



7 (I - II^IIFjXdir 



ty map: 



(60) 



E. Preconditioning for Fast Conjugate Gradient 
Convergence 

We have asserted that the quantity y = C _1 (x — (x)) 
can be estimated quickly using the conjugate gradient 
method as long as the condition number k(C) is rea- 
sonably small. Unfortunately, this is never the case for 
any realistic data cube we might analyze. In Figure [3] 
we plot the eigenvalues of C and its constituent matrices 
for a small data cube (only 6x6x8 voxels) taken from 
a larger, more representative volume. In this example, 
k(C) w 10 8 , which would cause the conjugate gradient 
method to require tens of thousands of iterations to con- 
verge. This is typical; as we discussed in Section III C 



values of around 10 8 are to be expected. We need to do 
better. 



1. The Form of the Preconditioner 

The core idea behind "preconditioning" is to avoid the 
large value of k(C) by introducing a pair of precondi- 
tioning matrices P and . Instead of solving the linear 
system Cy = (x — (x)), we solve the mathematically 
equivalent system: 



cy = p(x-<x», 



(61) 



where C = PCP^ and y' = (P^) _1 y. If we can compute 
P(x — (x)) and, using the conjugate gradient method 
on C, we can solve for y' and thus finally find y — 



13 




x Resolved Point Sources 
% Unresolved Point Sources 
+ Galactic Synchrotron Radiation 

Instrumental Noise 
— — Total Covariance (R+U+G+N) 



100 150 

N th Largest Eigenvalue 



200 



250 



300 



FIG. 4. The distinct patterns in the eigenvalue spectrum of our covariance matrix provide an angle of attack for making the 
calculation of C _1 (x — (x)) numerically feasible via preconditioning. The plotted eigenvalue spectra of the covariance for a 
very small data cube exemplifies many of the important characteristics of the constituent matrices. First, notice that the noise 
eigenvalue spectrum, while flatter than any of the others, is not perfectly flat. The condition number of N is related the ratio 
of the observing times in the most and least observed cell in the uv-plane. Sometimes this factor can be 10 3 or 10 4 . Another 
important pattern to notice are the fundamental differences between the eigenvalue spectra of U, G, and R. First off, R has 
mostly zero eigenvalues, because R is a block diagonal matrix with most of its blocks equal to zero. Second, despite the fact 
that U and G have nearly identical mathematical forms, U has stair-stepping eigenvalue spectrum while that of G is a much 
clearer exponential falloff. This is due to the much stronger correlations perpendicular to the line of sight in G. 



Pty'. If P and P^ are matrices that can be multiplied by 
quickly and if k(C) <C k(C), then we can greatly speed 
up our computation of y = C _1 (x — (x)). Our goal is 
to build up preconditioning matrices specialized to the 
forms of the constituent matrices of C. We construct 
preconditioners for C = N, generalize them to C = U + 
N, and then finally incorporate R and G to build the 
full prcconditioner. 

The result is the following: 



C = F^PuPrPN^P^PljF^ 



(62) 



3 10 



10 u 



Eigenvalues before 
preconditioning 



Eigenvalues after preconditioning 



Where Pu, Pr and Pn and preconditioners for U, 
r = R + G, and N respectively. A complete and ped- 
agogical explanation of this preconditioner and the mo- 
tivation for its construction and complex form can be 
found in Appendix[C] The definitions of the matrices can 
be found in Equations |C13[ |C25| and |C2| respectively. 

Despite its complex form and construction, the pro- 
cedure reduces fc(C) by many orders of magnitude. In 
Figure[5j in explicit contrast to Figure[4j we see a demon- 
stration of that effect. 



2. Computational Complexity of the Preconditioner 



50 



100 



150 



200 



250 



300 



N Largest Eigenvalue 



FIG. 5. The preconditioner for the conjugate gradient method 
that we have devised significantly decreases the range of eigen- 
values of C. Our preconditioner attempts to whiten the eigen- 
value spectra of the constituent matrices of C sequentially, 
first N, then R and G together, and finally U. By precondi- 
tioning, the condition number k(C), the ratio of the largest to 
smallest eigenvalues, is reduced from over 10 8 to about 10 1 . 



of our method over that of LTp^l 

First, let us enumerate the complexity of setting up 
the preconditioner for each matrix. Pn requires no setup 
since it only involves computing powers of the diagonal 



In Appendix [Cj we briefly discuss how the different 
steps in computing and applying this preconditioner scale 
with the problem size. If any of them scale too rapidly 
with N, we can quickly lose the computational advantage 



12 This section may be difficult to follow without first reading Ap- 
pendix [C] However, the key results can be found in Table [IT] 
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Operation 


Complexity 


Compute U eigensystem 


0(n*) 


Compute G eigensystems 


0(n 3 x ) + 0(n 3 y ) + 0(n 3 z ) 


Compute R eigensystems 


0(N R n 3 z ) 


Compute r eigensystems 


0(m(G z )(n x n y ) 3 ) 


Apply P N 


0(N log N) 


Apply Pu 


0(Nm(U z )) 


Apply P r 


0(Nm(G)) + 0(NN R m(H n )) 



10 



TABLE II. The computational complexity of setting up the 
preconditioner is, at worst, roughly 0(N 2 ), though this op- 
eration only needs to be performed once. Even for large data 
cubes, this is not the rate-limiting step in power spectrum es- 
timation. The computational complexity of applying the pre- 
conditioner ranges from 0(N log N) to O(NNr). For large 
data cubes with hundreds of bright point sources, the precon- 
ditioning time is dominated by Pr, which is in turn dom- 
inated by preconditioning associated with individual point 
sources. The computational complexity of the preconditioner 
therefore depends on the number of point sources considered 
"resolved," which scales with both field of view and with the 
flux cut. Here Nr is the number of resolved point sources in 
our field of view, na is the size of the box in voxels along the 
d th dimension, and m is the number of relevant eigenvalues 
of a matrix above the noise floor that need preconditioning. 



matrix N (see Appendix CI). Ptj requires the eigenvalue 
decomposition of XJ Z , the component of U alon g the line 



of sight, which takes 0(n z ) time (see Appendix C2|. 

We need the eigensystems of R and G to compute 
the eigensystem of T for Pr (see Appendix C3). R 



requires performing one eigenvalue decomposition of an 
n z x n z matrix for every resolved point source; that takes 
0(N^n 3 ) time. G simply requires three eigenvalue de- 
compositions: one for each matrix like those that appear 
for U in Equation |C3| whose total outer product is G. 
Thus, the complexity is 0(n%) + 0{n\) + 0(n 3 z ). 

Next, we need to compute the eigenvalues of r_i_,k> the 
components of T perpendicular to the line of sight cor- 
responding to each of the "relevant" (i.e. much bigger 
than the noise floor) eigenvalues of T along the line of 
sight (see Appendix C 3 for a more rigorous definition) . 



Using the notation we develop in Appendix |C 2[ we de- 
note the number of relevant eigenvalues of a matrix M 
as m(M). The number of times we need to decompose 
an n x n y x n x n y matrix is generally equal to the num- 
ber of relevant eigenvalues of G z , since the number of 
relevant eigenvectors is almost always the same for G 
and R. So we have then a computational complexity 
of 0(m(G z )(n x n y ) 3 ). Given the limited angular resolu- 
tion of the experiment and the flat sky approximation, 
we generally expect n x and n y to be a good deal smaller 
than rif, making this scaling more tolerable. All these 
scalings are summarized in Table |TT| 

Until now, all of our complexities have been 
O(NlogN) or smaller. Because these small incursions 
into bigger complexity classes are only part of the set-up 
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FIG. 6. For large data cubes and a fixed definition of what 
constitutes a "bright" point source, the complexity of pre- 
conditioning is dominated by the number of resolved point 
sources. Specifically, the complexity of preconditioning for V 
scales as N 2 ^ 3 because the number of resolved point sources is 
simply proportional to the solid angle of sky surveyed, which 
scales with the survey volume (and thus number of voxels, 
assuming fixed angular and frequency resolution) to the | 
power. This also confirms our assertion that the number of 
important eigenvalues of G and U should scale logarithmi- 
cally with data cube size (albeit with a different prefactor). 
Each of the data cubes is taken from the same survey with the 
same ratio of width to depth. The number of eigenvalues to 
precondition is computed assuming an eigenvalue threshold 
of = 1. 



cost, they are not intolerably slow as long as m(G z ) is 
small. This turns out to be true because the eigenvalue 
spectra of R n and G z fall off exponentially, meaning that 
we expect the number of relevant eigenvalues to grow only 
logarithmically. This is borne out in Figure [6] where we 
see exactly how the number of eigenvalues that need to 
be preconditioned scales with the problem size. 

Let us now turn to a far more important scaling: that 
of multiplying the preconditioner by a vector. The set-up 
needs to be done only once per Fisher matrix calculation; 
the preconditioning needs to happen for every iteration 
of the conjugate gradient method. Pn is the easiest; we 
only ever need to perform a Fourier transform or mul- 
tiply by a diagonal matrix. The complexity is merely 
O(NlogN). Pu only involves multiplying by vectors for 
each relevant eigenvalue of U z , so the total complexity is 
0(Nm(U z )). 

Finally, we need to assess the complexity of applying 
Pr- When performing the eigenvalue decomposition of 
r_i_,fe) we expect roughly the same number of eigenvalues 
to be important that would have been important from 
R and G separately for that k index. Each of those 
eigenvectors takes O(N) time to multiply by a vector. 
So we expect to deal with m(G) eigenvalues from G and 
one eigenvalue from each resolved point source for each 
relevant value of k, or about Njim(R n ). Applying Pp 
therefore is 0{Nm(G)) + 0(NN R m(R n )). If we keep 
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the same minimum flux for the definition of a resolved 
point source and if we scale our cube uniformly in all 
three spatial directions, then Nr oc TV 2 / 3 . 

This turns out to be the rate-limiting step in the en- 
tire algorithm. If we decide instead to only consider the 
brightest Nr to be resolved, regardless of box size, then 
applying Pp reduces to O(NlogN). Likewise, if we are 
only interested in expanding the frequency range of our 
data cube, the scaling also reduces to O(NlogN). We 
can comfortably say then that the inclusion of a model for 
resolved point sources introduces a complexity bounded 
by 0(N log N) and 0(N 5 / 3 ). We can see the precise 
computational effect of the preconditioner when we re- 
turn in Section |IV C| to assess the overall scaling of the 
entire algorithm. These results are also summarized in 
Table HU 



3. Preconditioner Results 

Choosing which eigenvalues are "relevant" in the con- 
stituent matrices of C and therefore need preconditioning 
depends on how these eigenvalues compare to the noise 
floor. In Appendix |C 2| we define a threshold 9 which 
distinguishes relevant from irrelevant eigenvalues by com- 
paring them to 9 times the noise floor. Properly choosing 
a value for 9, the threshold below which we do not pre- 
condition eigenvalues of U and T, presents a tradeoff. 
We expect that that too low of a value of 9 will precondi- 
tion inconsequential eigenvalues, thus increasing the con- 
jugate gradient convergence time. We also expect that 
too large of a value of 9 will leave some of the most im- 
portant eigenvalues without any preconditioning, vastly 
increasing convergence time. Both of these expectations 
are borne out by our numerical experiments, which we 
present in Figure [7] 

In this work, we choose 9 — 1 (all foreground eigenval- 
ues above the noise floor are preconditioned) for simplic- 
ity and to be sure that we are not skipping the precon- 
ditioning of any important foreground eigenvalues. One 
might also worry that more iterations of the algorithm 
provides more opportunity for round-off error to accumu- 
late and prevent convergence, as has sometimes proven 
the case in our numerical experiments. For lengthy or 
repeated calculations of the Fisher matrix, it is wise to 
explore the performance of several levels of precondition- 
ing, especially if it can garner us a another factor of 2 in 
speed. 



F. Fast Simulation of Foregrounds and Noise 



We concluded Section UlI Al with the fact that a Monte 
Carlo calculation of the Fisher matrix required the abil- 
ity to compute q from many different realizations of the 
foregrounds and noise modeled by C. In Sections |IIIB| 
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FIG. 7. This plot shows how computational time scales with 
8, the threshold for preconditioning, for the conjugate gra- 
dient method performed on an TV ~ 10 4 voxel data cube. It 
appears that for this particular covariance matrix, a minimum 
exists near 8 = 10 4 . At the minimum, the greater number of 
conjugate gradient iterations are balanced by quicker indi- 
vidual iterations (since each iteration involves less precondi- 
tioning) . We can see from this plot that there exists a critical 
value of 8 around 5 x 10 4 where the preconditioning of a small 
number of additional eigenvalues yields a large effect on the 
condition number of the resultant matrix. Without precon- 
ditioning, sufficiently large values of k(C) could also lead to 
the accumulation of roundoff error that prevents convergence 
of the conjugate gradient method. 



But where does x come from? When we want to es- 
timate the 21 cm temperature power spectrum of our 
universe, x will come from data cubes generated from 
real observations. But in order to calculate F, which is 
essential both to measuring p and estimating the error on 
that measurement, we must first be able to create many 
realizations of x drawn from our models for noise and 
foregrounds that we presented in Section [ill D[ 

A mathematically simple way to draw x from the right 
covariance matrix is to create a vector n of independent 
and identically distributed random numbers drawn from 
a normal distribution with mean and standard devia- 
tion 1. Then, it is easy to see that 



X^C 1 / 2 !! 



(63) 



through III E we have shown how to quickly calculate q 



from a data vector x using Equation 10 



is a random vector with mean and covariance C. Un- 
fortunately, computing C 1 / 2 is just as computationally 
difficult as computing C . 

In this last section of our presentation of our fast 
method for power spectrum estimation and statistics, we 
will explain how a vector can be created randomly with 
covariance C. We do so by creating vectors randomly 
from each constituent matrix of C, since each contribu- 
tion to the measured signal is uncorrelated. In Section 
|IIIF 5[ we will demonstrate numerically that these sim- 
ulations can be performed quickly while still being accu- 
rately described by the underlying statistics. 
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1. Resolved Point Sources 

The simplest model to reproduce in a simulation is the 
one for resolved point sources, because the covariance 
was created from a supposed probability distribution over 
their true fluxes and spectral indices. We start with a list 
of point sources with positions and with a specified but 
uncertain fluxes and spectral indices. These fluxes can 
either come from a simulation, in which case we draw 



them from our source count distribution (Equation 29 ) 
and spectral indices from a Gaussian distribution, or from 
a real catalog of sources with its attendant error bars. 
The list of sources does not change over the course of 
calculating the Fisher matrix. 

In either case, calculating a random xr requires only 
picking two numbers, a flux and a spectral index, for 
each point source and then calculating a temperature in 
each voxel along that particular line of sight. The lat- 
ter is easy, since we assume it is drawn from a Gaussian. 
The former can be quickly accomplished by numerically 
calculating the cumulative probability distribution from 
Equation |29| and inverting it. Each random xr, is there- 
fore calculable in 0(N R n z ) < <D{N) time. 



2. Unresolved Point Sources 

We next focus on U, which is more difficult. Our goal 
is to quickly produce a vector with specified mean and 
covariance. LT has already established what value we 
want for (xu) and (xq) with a calculation very similar 
to Equation 50 We need to figure out how to produce a 
vector with zero mean and the correct covariance. 

One way around the problem of calculating C 1 / 2 is 
to take advantage of the eigenvalue decomposition of the 
covariance matrix. That is because if C = QAQ T , where 
Q is the matrix that transforms into the eigenbasis and 
A is a diagonal matrix made up of the eigenvalues, then 
C 1 / 2 = QA 1 ^ 2 Q T . We already found the few important 



eigenvalues of U for our preconditioner (see Section C 2 1, 
so does this technique solve our problem? 

Yes and no. In the direction parallel to the line 
of sight, this technique works exceedingly well because 
only a small number of eigenvectors correspond to non- 
negligible eigenvalues. We can, to very good approxima- 
tion, ignore all but the largest eigenvalues (which cor- 
respond to the first few "steps" in Figure |2j) We can 
therefore generate random unresolved point source lines 
of sight in 0(n z m(XJ z )) with the right covariance. 

A problem arises, however, when we want to generate 
xu with the proper correlations perpendicular to the line 
of sight. Unlike the extremely strong correlations parallel 
to the line of sight, these correlations are quite weak. 
Weak correlations entail many comparable eigenvalues; 
in the limit that point sources were uncorrelated, XJ X (g) 
TJ y — > Ij_ and all the eigenvalues would be 1 (though 
the eigenvectors would of course be much simpler too). 
Utilizing the same technique as above would require a 



total complexity of 0(Nn x n y ) time, which is slower than 
we would like. 

However, the fact that both XJ X and XJ y are Toeplitz 
matrices allows us to use the same sort of trick we em- 
ployed to multiply our Toeplitz matrices by vectors in 
Section 



IIID1 



It turns out t 



to draw random vectors from U r (8)U y 1651 . 



ihat the circulant matrix in which we em- 
bed our covariance matrix must be positive-semidefinite 
for this technique to work. Although there exists such 
an embedding for any Gaussian covariance matrix, only 
Gaussians with coherence lengths small compared to the 
box size can be embedded in a reasonably small circulant 
matrix — exactly the situation we find ourselves in with 
Uj_. As such, we can generate random xu vectors in 
0(Nm(U z )\og(n x n y )) « O(AlogA). 



3. Galactic Synchrotron Radiation 

The matrix G differs from U primarily in the coher- 
ence length perpendicular to the line of sight. Unlike 
U, G has only a small handful of important eigenvalues, 
which means that random xq vectors can be generated 
in the same way we create line of sight components for 
xu vectors, which we described above. Since m(G) is so 
small (see Figure 4k and grows so slowly with data cube 
size (see Figure p ) , we can create random xg vectors in 
approximately (D[N). 



4- Instrumental Noise 

Finally, we turn to N, which is also mathematically 
simple to simulate. First off, (xn) = 0. Next, because 
N is dia gona l in the Fourier basis, we can simply use 



Equation 



63 



Because N = F^NF ± , 



N i/2 = F 1 L N 1/2 F_ L , 



(64) 



which is computationally easy to multiply by n because 
N is a diagonal matrix. The most computationally in- 
tensive step in creating random x^-vectors is the fast 
Fourier transform, which of course scales as O (N log N). 



5. Data Simulation Speed and Accuracy 

Before we conclude this section and move on to the 
results of our method as a whole, we verify what we 
have claimed in the above sections: namely that we can 
quickly generate data cubes with the correct covariance 
properties. Figure [8] verifies the speed, showing that the 
algorithm is both fast and well-behaved for large data 
cubes. 

In order to show that the sample covariance of a large 
number of random x vectors converges to the appropri- 
ate covariance matrix, we must first define a convergence 
statistic, e. We are interested in how well the matrix 



17 



10 



To 10 r 

C 

o 
o 

01 

w 

CD 

E 
i- 

o 10"' 



10 - 



10 



Resolved Point Sources 




~^ X Unresolved Point Sources 




x 

~~ r~ Galactic Synchrotron Radiation 




x 

Instrumental Noise 




"^ X Total 






' - 5^ , - ' 



10 



10 

Data Cube Size (Voxels) 



10 



10" 



10" 



0) 



10" 



10" 



* * " " " " " - - + - + + . 

*"*-*-*-*-*-*_* * x ' : - ~* - + -+ - +. 



x~ x- x. _x_ _ 



x Resolved Point Sources 

* Unresolved Point Sources 

+ Galactic Synchrotron Radiation 
Instrumental Noise 

• Total Covariance (R+U+G+N) 



10 10 
Number of Random Data Cubes 



10 



FIG. 8. In order to estimate the Fisher matrix via a Monte 
Carlo, we need to draw random data cubes from our modeled 
covariance. Here we show that we can do so in 0(N log N) 
by plotting computational time as a function of problem size 
for generating a random x for each of the constituent sources 
of x. In practice, generating random x vectors is never the 
rate-limiting step in calculating F. 



FIG. 9. We verify that our technique for quickly generating 
random data cubes actually reproduces the correct statistics 
by generating a large number of such cubes and calculating 
their sample covariances. Plotted here is the error statis- 
tic detailed in Equation |65| The color-matched dotted lines 
are the expected convergences for correlated Gaussians from 
Equation |66| 



converges relative to the total covariance matrix C. For 
example, for R we choose: 



e(R) 



E, 



Ri 



Ri 



(65) 



where R is the sample covariance of n random xr vectors 
drawn from R. If each x is a Gaussian random vector 
then the expected RMS value of e is: 



e(ny 



i 



E 



r>2 

hJ ij 



(trR) 2 



E 



(7 2 



(66) 



In Figure |9j we see that all four constituent matrices of 
C converge like n"" 1 / 2 , as expected, with very nearly the 
prefactor predicted by Equation [66] We can be confi- 
dent, therefore, in both the speed and accuracy of our 
technique for generating random vectors. 
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FIG. 10. Our Monte Carlo converges to the correct Fisher 
matrix as n _1//2 , as expected. In this plot, we compare the 
sample covariance of many q vectors generated from small 
data cubes to an exact calculation of F by calculating the 
relative error of their diagonals. 



G. Method Accuracy and Convergence 

Before we move on to discuss some of the results of our 
method, it is worthwhile to check that no unwarranted 
approximations prevent it from converging to the exact 
form of the Fisher information matrix in Equation |14| 
Since calculating F exactly can only be done in O(N^) 
time, we perform this test in two parts. 

First, we measure convergence to the exact Fisher ma- 
trix for a very small data cube with only 6x6x8 voxels. 
Taking advantage of Equation [21] we generate an esti- 
mate of F, which we call F, from the sample covariance 



of many independent q vectors. We compare these F, 
which we calculate periodically along the course of the 
Monte Carlo, with the F that we calculated directly us- 
ing Equation [14] As we show in Figure [10] the sample 
covariance of our q vectors clearly follows the expected 
vT 1 ! 2 convergence to the correct result. 

However, we are more concerned with the accuracy of 
the method for large data cubes which cannot be tackled 
by the LT method. Unfortunately, for such large data 
cubes, we cannot directly verify our result except in the 
case where C = I. In concert with other tests for agree- 
ment with LT, we also check that the method does indeed 



18 



10 u 



[0 

55 



0) 

E? 
<o 

> 10 

o 

O 



10 



10 10 10 

Number of Monte Carlo Simulations 



FIG. 11. For large data cubes, the convergence of the sam- 
ple covariance of our q vectors to F also nicely follows the 
expected n _1//2 scaling. We perform this analysis on a data 
cube with 1.5 x 10 5 voxels analogously to that which we per- 
formed in Figure [TO] except that we use the sample covariance 
of double the number of Monte Carlo iterations as our "true" 
Fisher matrix. This explains the artificially fast convergence 
we see in the last few points of the above plot. 



converge as n -1 / 2 by comparing the convergence of sub- 
sets of the q" vectors up to n/2 Monte Carlo iterations to 
the reference Fisher matrix, which we take to be the sam- 
ple covariance of all n iterations. As we show in Figure 
[TT| our expectation is borne out numerically. 



H. Method Summary 

We have constructed a technique that accelerates the 
LT technique to O(NlogN) and extends it to include 
bright point sources in, at worst, 0(N 5 / 3 ). We do so 
by generating random data vectors with the modeled 
foreground and noise covariances and calculating the 
Fisher information matrix via Monte Carlo. We are able 
to calculate individual inverse variance weighted power 
spectrum estimates quickly using the conjugate gradient 
method with a specially adapted preconditioner. 

Our method makes a number of assumptions, most of 
which are not shared with the LT method. Our method 
can analyze larger data sets but at a slight loss of gen- 
erality. Although we have mentioned these assumptions 
throughout this work, it is useful to summarize them in 
one place: 

• Our method relies on a small enough data cube 
perpendicular to the line of sight that it can be 
approximated as rectilinear (see Figure [l]). 

• We approximate the natural log of the quotient 
of frequencies in the exponent of our point source 
covariance matrix by a leading-order Taylor ex- 
pansion (Equation 53 1 . This assumption makes 



along the line of sight and thus amenable to fast 
multiplication using Toeplitz matrix techniques. 
This is a justified assumption as long as the co- 
herence length of the foregrounds is much longer 
than the size of the box along the line of sight. 

• Our ability to precondition our covariance matrix 
for the conjugate gradient method depends on the 
approximation that the correlation length of U per- 
pendicular to the line of sight, due to weak spatial 
clustering of point sources, is not much bigger than 
the pixel size. For the purposes of preconditioning, 
we approximate Uj_ to be the identity (see Section 
The longer the correlation length of Uj_ , 



C2| 



the 

longer the conjugate gradient algorithm will take 
to converge. 



• Likewise, the speed of the preconditioned conjugate 
gradient algorithm depends on the similarities of 
the eigenmodes of the covariances for R, U, and G 
along the line of sight. The more similar the eigen- 
modes are (though their accompanying eigenvalues 
can be quite different) the more the precondition- 
ing algorithm can reduce the condition number of 
C. We believe that this similarity is a fairly general 
property of models for foregrounds, though the in- 
troduction of a radically different foreground model 
might require a different preconditioning scheme. 

• We assume that the number of Monte Carlo iter- 
ations needed to estimate the Fisher information 
matrix is not so large that that it precludes an- 
alyzing large data cubes. Because the process of 
generating more artificial q vectors is trivially par- 
allelizable, we do not expect getting down to the 
requisite precision on the window functions to be 
an insurmountable barrier. 

One common theme among these assumptions, espe- 
cially the last three, is that the approximations we made 
to speed up the algorithm can be relaxed as long as we 
are willing to accept longer runtimes. This reflects the 
flexibility of the method, which can trade off speed for 
accuracy and vice versa. 



IV. RESULTS 

Now that we are confident that our method can ac- 
curately estimate the Fisher information matrix and can 
therefore calculate both power spectrum estimates from 
data and the attendant error bars and window functions, 
we turn to the first end-to-end results of the algorithm. In 
this Section, we demonstrate the power our method and 
the improvements that it offers over that of LT. First, 



the foreground covariances translationally invariant 



in Section IV A we show that our technique reproduces 
the results of LT in the regions of Fourier space where 
they overlap. Then in Section |IV B| we highlight the im- 
provements stemming from novel aspects of our method, 
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especially the inclusion of the pixelization factor |$(k)| 2 
in Q Q and N and the separation of point sources into 
resolved point sources (R) and unresolved point sources 
(U), by showing how different parts of our algorithm af- 
fect F. In Section IV C we examine just how much faster 
our algorithm is than that of LT, and lastly, in Section 
|IVD| we forecast the cosmological constraining power of 
the 128-tile deployment of the MWA. 



A. Comparison to Liu & Tegmark 

First we want to verify that our method reproduces 
that of LT in the regions of Fourier space accessible to 



both methods. Figure 12 provides an explicit comparison 
to LT's Figures 2 and 8. These plots show the shaded re- 
gions representing their method and over-plotted, white- 
outlined contours representing ours. Both are on the 
same color scale. These plots show error bars in tempera- 
ture units in fcj_-fcj| space and a selection of window func- 
tions in both the case where C = N and C = U + G + N. 
They are generated from the same survey geometry with 
identical foreground and noise parameters. In the regions 
where the methods overlap, we see very good agreement 
between the two methods. 

In addition to the modes shown in the shaded regions 
in Figure [l2j the LT method can access Fourier modes 
longer than the box size, which we cannot. This is no 
great loss — these modes are poorly constrained by a sin- 
gle data cube. Moreover, they are generally those most 
contaminated by foregrounds; the low-fcj_ modes will see 
heavy galactic synchrotron contamination while the low- 
kii modes will be contaminated by types of the fore- 
grounds. We imagine that very low-fcj^ Fourier modes, 
those that depend on correlations between data cubes 
that cannot be joined without violating the flat-sky ap- 
proximation, will still be analyzed by the LT method. 
Because our method can handle many more voxels, it ex- 
cels in measuring both medium and high-/c modes that 
require high spectral and spatial resolution. 



B. Novel Effects on the Fisher Matrix 

A simple way to understand the different effects that 
our forms of C and Q Q have on the Fisher information 
matrix, especially the novel inclusions of R and |<I>(k)| 2 , 
is to build up the Fisher matrix component by compo- 
nent. In Figure [13] we do precisely that by plotting the 
diagonal elements of F. These diagonal elements are re- 
lated to the vertical error bars on our band powers. Large 
values of F aa correspond to band powers about which we 
have more information. 

In the top two panels of Figure [13] we show the first 
novel effect that our method takes into account. In them, 
we can see how modeling the finite size of our voxels af- 
fects the information available in the case where C = I 
(the color scale for these two panels only is arbitrary). In 



the top left panel, we have set |$(k)| 2 = 1, which corre- 
sponds to the delta function pixelization of LT. We see 
that the amount of information depends only on k± . This 
is purely a binning effect: our bins in k± are concentric 
circles with constant increments in radius; higher values 
of k± incorporate more volume in Fourier space, except 
at the high-fc^ edge where the circles are large enough 
to only include the corners of the data cube. In the top 
right panel, we see that including |$(k)| 2 ^ 1 affects our 
ability to measure high-fc modes, which depends increas- 
ingly on our real space resolution and is limited by the 
finite size of our voxels. 

In the middle left panel, we now set C = N. In com- 
parison to C = I, the new covariance matrix (and thus 
new vector x for the Monte Carlo calculation of F), shifts 
the region of highest information to a much lower value 
of k±. Though there are fewer Fourier modes that sam- 
ple this region, there are far more baselines in the array 
configuration at the corresponding baseline length. Our 
noise covariance is calculated according to our derivation 
in Section fill D 31 for 1000 hours of observation with the 
128-tile deployment of the MWA [27] , 

We next expand to C = U + N for the middle right 
panel, where we have classified all point sources as "un- 
resolved." In other words, we take 5* cu t in Equation [31] 
to be large (we choose 200 Jy, which is representative of 
some of the brightest sources at our frequencies). As we 
expect, smooth spectrum contamination reduces our abil- 
ity to measure power spectrum modes with low values of 
fcii . This is because of the exponentially decaying eigen- 
value spectrum of Um , most of which is smaller than the 
eigenvalues of N. The effect is seen across k± because the 
characteristic clustering scale of unresolved point sources 
is smaller than the pixel size; localized structure in real 
space corresponds to unlocalized power in Fourier space. 

In the bottom left panel, we have included informa- 
tion about the positions of roughly 200 resolved point 
sources above 100 mJy, with random fluxes drawn from 
our source count distribution (Equation 29 ) and ran- 



dom spectral indices drawn from a Gaussian centered on 
n n = 0.5 with a width of 0.15. By doing this, we reduce 
S cut in our model for U down to 100 mJy. Including all 
this extra information — positions, fluxes, flux uncertain- 
ties, spectral indices, and spectral index uncertainties — 
provides us with significantly more Fisher information 
at low-fey where foregrounds dominate and thus smaller 
errors on those modes. Additionally, by incorporating 
resolved point sources as part of our inverse covariance 
weighting, we no longer have to worry about forward 
propagating errors from any external point-source sub- 
traction scheme. In the left panel of Figure [14] we see the 
ratio of this panel to the middle right panel. 

Finally, in the bottom right panel of Figure [13] we 
show the effect of including Galactic synchrotron radia- 
tion. Adding G has the expected effect; we already know 
that G has only a few important eigenmodes which corre- 
spond roughly to the lowest Fourier modes both parallel 
and perpendicular to the line of sight. As a result, we 



20 



o 
Q. 




10" 



10" 




10" 




10" 



k (cMpc -1 ) 



n 



3.5 



E 



2.5 



I 

n 





I 



10" 



k (cMpc -1 ) 



o 



1.5 



0.8 



g 

0.6 to 
DC 

0.4 I 
Z> 



0.2 



FIG. 12. Our method faithfully reproduces the results of LT in the regions of Fourier space accessible to both methods. Here 
we recreate both the vertical error bar contours from LT's Figure 8 (top two panels) and a few selected window functions from 
LT's Figure 2 (bottom two groups of panels). The shaded regions represent the LT results; the white-outlined, colored contours 
are overplotted to show our results. Both are on the same color scale. Following LT, we have plotted both the case without 
foregrounds (C = N, left two panels) and the case with foregrounds (C = N + U + G, right two panels), which allows us to 
get a sense for the effects of foregrounds on our power spectrum estimates, error bars, and window functions. 



only see a noticeable effect in the bottom left corner of 
the k±-k\\ plane; we include the ratio of the two figures 



in the right panel of Figure 14 for clarity. Otherwise, our 
Galaxy has very little effect on the regions of interest. In 
fact, the similarity between the this panel and the middle 
left panel tells us something very striking: in the regions 
of Fourier space that our data most readily probes, fore- 
grounds (once properly downwcightcd) should not prove 
an insurmountable obstacle to power spectrum estima- 
tion. 



The set of plots in Figure 13 is useful for developing a 
heuristic understanding of how noise and foregrounds af- 
fect the regions in which we can most accurately estimate 
the 21 cm power spectrum. With it, we can more easily 



identify the precise regions of fc-space that we expect to 
be minimally contaminated by foregrounds and can thus 
tailor our instruments and our observations to the task 
of measuring the 21 cm power spectrum. 



C. Computational Scaling of the Method 

Now that we understand how our technique works, we 
want to also see that it works as quickly as promised by 
and achieves the desired computational speed up over the 
LT method. Specifically, we want to show that we can 
achieve the theoretical 0(N log N) performance in repro- 
ducing the results of LT. We also want to better under- 
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FIG. 13. The Fisher information matrix provides a useful window into understanding the challenges presented to measuring 
the 21 cm power spectrum by the various contaminants. In this figure, we add these effects one by one, reading from left to 
right and top to bottom, to see how the diagonal of the Fisher matrix (expanded along the k± and k» directions), is affected. 
Brighter regions represent, roughly speaking, more information and thus smaller error bars (in power spectrum units). We 
comment in more detail on each panel individually in Section |IV B[ including upon the advantages of the novel aspects of our 
technique. Top left: the covariance matrix is taken to be the identity and pixelization effects on Q a are ignored. Top right: 
the pixelization factor |<I>(k)| 2 is included and not set to 1. Middle left: the noise expected from 1000 hours of observation 
with the MWA 128-tile configuration is included. Middle right: all point sources (up to 200 Jy) are modeled as unresolved; 
all information about their positions is ignored. Bottom left: resolved point sources are included in the model, with all point 
sources dimmer than 100 mjy considered unresolved. Bottom right: in addition to bright point sources, galactic synchrotron 
is also included. 
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Improvements from Degradation from 

Resolving Point Sources: Galactic Synchrotron: 




FIG. 14. By comparing the Fisher matrices arising from the various covariance models we explore in Section [IV B| and Figure 
|13| the precise improvements are brought into sharper focus. In the left panel, we can see the information we gain by explicity 
including resolved point sources. Shown here is the ratio of the bottom left panel of Figure [13] to the middle right panel. By 
taking into account precise position, flux, and spectral information and uncertainties, we improve our ability to measure the 
power spectrum at the longest scales parallel to the line of sight, effectively ameliorating the effects of foregrounds. In the 
right panel, we see the remarkably small effect that the galactic synchrotron radiation has on our abilty the measure the 21 cm 
power spectrum. Shown here is the ratio of the bottom right panel of Figure [13] to the botton left. Because we take spatial 
information into account, the strong spatial and spectral coherence of the signal from our Galaxy is confined to the bottom left 
corner of the k\\-k± plane. 



stand the computational cost of including resolved point 
sources so as to compare that cost to benefits outlined in 
Section |IVB| We have therefore tested the algorithm's 
speed for a wide range of data cube sizes; we present the 



results of that study in Figure 15 



In this figure, we show the combined setup and run- 
time for power spectrum estimates including 1000 Monte 
Carlo simulations of q for estimating the Fisher matrix 
on a single modern CPU. For each successive trial, we 
scale the box by the same ratio in all three dimensions. 
Because we maintain a fixed flux cut, increasing the linear 
size of the box by a factor of two increases the number of 
resolved point sources in the box by a factor of 4 and the 
number of voxels by a factor of 8. With any more than a 
few point sources, the computational cost becomes dom- 
inated by point sources, leading to an overall complexity 
of O(NNr). In this case, the largest data cubes include 
about 400 point sources over a field of about 50 square 
degrees, accounting for about 15% of the lines of sight in 
the data cube. 

For any given analysis project, these exists a trade- 
off between including additional astrophysical informa- 



tion into the analysis and the computational complexity 
of that analysis; at some point the marginal cost of a 
slower algorithm exceeds marginal benefit of including 
more bright point sources. It is beyond the scope of this 
paper to prescribe a precise rubric for where to draw 
the line between resolved and unresolved point sources. 
However, we can confidently say that the algorithm runs 
no slower than 0(iV 5 / 3 ) and can often run at or near 
O (N log N) if only the brightest few point sources are 
treated individually. 



D. Implications for Future Surveys 

Though the primary purpose of this paper is to de- 
scribe an efficient method for 21 cm power spectrum 
analysis, our technique enables us immediately to make 
predictions about the potential performance of upcoming 
instruments. In this section we put all our new machin- 
ery to work in order to see just how well the upcoming 
128-tile deployment of the MWA can perform. 
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FIG. 15. Our algorithm scales with the number of voxels, N, 
as O(TVlogTV) in the best case and as C(iV 5/3 ) in the worst, 
depending on the treatment of bright point sources. If we 
choose to ignore all information about the location, bright- 
ness, and spectra of bright point sources, we can estimate 
the power spectrum in 0(N log N). If we choose to take into 
account this extra information, the algorithmic complexity 
increases to O(NNr), where Nr is the number of bright, re- 
solved point sources. For a fixed minimum flux for "bright" 
sources, this leads to 0(N 5 ^ 3 ) complexity for uniform scal- 
ing in all three dimensions. Both scenarios represent a major 
improvement over the LT method, which scales as 0(N 3 ). 
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We envision 1000 hours of integration on a field that is 
9° on each side, centered on z = 8 with Az = 0.5. With a 
frequency resolution of 40 kHz and an angular resolution 
of 8 arcminutes, our data cube contains over 10 6 voxels. 
We completed over 1000 Monte Carlo iterations on our 
12 core server in about one week. We use the foreground 
parameters outlined above in Sections |II D| and |III D| In 
Figures 16 through 18 we show the diagonal elements of 
the Fisher matrix we have calculated, the temperature 
power spectrum error bars, and a sampling of window 
functions. 

In Figure[l6]we plot the diagonal elements of the Fisher 
matrix, which are related directly to the power spectrum 
errors. Drawing from the discussion in Section [IVB| we 
can see clearly the effects of the array layout (and thus 
noise), of foregrounds (included resolved point sources), 
and of pixelization. Interestingly, until pixelization ef- 
fects set in at the highest values of k\\ , the least contam- 
inated region spans a large range of values of fey . One 
way of probing more cosmological modes is to increase 
the frequency resolution of the instrument. The num- 
ber of modes accessible to the observation scales with 
(Ai/) -1 , though the amplitude of the noise scales scales 
with (Ai/) -1 / 2 . As long as the noise level is manage- 
able and the cosmological signal is not dropping off too 
quickly with k, increasing the frequency resolution seems 
like a good deal. 



In Figure 17 we show the vertical error bars that we ex- 



FIG. 16. The diagonal of the Fisher matrix predicted for 1000 
hours of observation with the MWA with 128 tiles shows the 
region of power spectrum space least contaminated by noise 
and foregrounds. Noise, and thus array layout, dominates 
the shape of the region of maximum information, creating a 
large, vertical region at a value of k± corresponding to the 
typical separation between antennas in the compact core of 
the array. The contaminating effects of the foregrounds are 
clearly visible at low-feii . 



pect on power spectrum estimates in temperature units. 
The most important fact about this plot is that there is 
a large region where we expect that vertical error bars 
will be sufficiently small that we should be able to detect 
a 10 mK signal with signal to noise greater than 1. This 
is especially the case at fairly small values of k, which 
is surprising since these k modes were supposed to be 
the most contaminated by foregrounds. There are two 
reasons why this happens. 

First, the conversion to temperature units (Equation 
251 introduces a factor of (/c^fcy) 1 / 2 that raises the error 
Ears for larger values of k. Second, the strongest fore- 
ground modes overlap significantly with one of the k = 
modes of the discrete Fourier transform, which we ex- 
clude for our power spectrum estimate (this is just the 
average value of the cube in all three directions, which is 
irrelevant to an interferometer that is insensitive to the 
average) . 
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FIG. 17. The expected error bars in temperature units on 
decorrelated estimates of the power spectrum highlight a siz- 
able region of fc-space where we expect to be able to use the 
MWA with 128 tiles to detect a fiducial 10 mK signal with a 
signal to noise ratio greater than 1. Perhaps surprisingly, the 
smallest error bars are still on the smallest k modes acessible 
by our method, though some of them are contaminated by 
large foregrounds. This is because our conversion to temper- 
ature units includes a factor of (fcx^n) 1 ^ 2 , which accounts for 
the difference between this Figure and Figure |16| From the 
shape of the region of smallest error, we can better appreci- 
ate the extent to which noise and our array layout determines 
where in fc-space we might expect to be able to detect the 
EoR. The noisiness at high-fc is due to Monte Carlo noise and 
can be improved with more CPU hours. 



Another way to think about it is this: because the co- 
herence length of the foregrounds along the line of sight 
is much longer than the size of any box small enough 
to comfortably ignore cosmological evolution, we expect 
that the most contaminated Fourier mode will be pre- 
cisely the one we ignore. Unlike the LT method, our 
method cannot easily measure modes much longer than 
the size of the data cube. Along the line of sight, these 
modes have very wide window functions and are the most 
contaminated by foregrounds. Perpendicular to the line 
of sight, these modes are better measured by considering 
much larger maps where the flat sky approximation no 



longer holds. For the purposes of measuring these low-A; 
modes, the LT method can provide a useful complement 
to ours. Large-scale modes from down-sampled maps 
can be measured by LT; smaller-scale modes from full- 
resolution maps can be measured by our method. Then 
both can be combined to estimate the power spectrum 
across a large range of scales. 

And finally, in Figure [TS] we show many different win- 
dow functions for a selection of values of k± and kn that 
spans the whole space. In general, these window func- 
tions are quite narrow, meaning that each band power 
measurement probes only a narrow range of scales. The 
widest windows we see look wide for two reasons. First, 
linearly separated bins appear wider at low-fc when plot- 
ted logarithmically. Second, foregrounds cause contami- 
nated bins to leak into nearby bins, especially at low- fey 
and moderate k±. We saw hints of this effect in Figure 
12 when comparing noise-only simulations to simulations 
with both noise and foregrounds. 

In the vast majority of the k_i_-kii plane, the window 
functions seem to be dominated by the central bin and 
neighbors. Except for edge cases, no window function 
has contributions exceeding 10% from bins outside the 
central bin and its nearest neighbors. This means that we 
should be happy with our choice of Fourier space binning, 
which was designed to have bin widths equal to those of 
our data cube before zero padding. We also know that 
significantly finer binning would be inappropriate, so we 
do not have to worry about the tradeoff between fine 
binning of the power spectrum and the inversion of the 
Fisher matrix. Therefore, with the 128-tile deployment 
of the MWA, we can be confident that our estimates of 
the power spectrum correspond to distinct modes of the 
true underlying power spectrum. 



V. CONCLUSIONS 

With this paper, we have presented an optimal algo- 
rithm for 21 cm power spectrum estimation which is dra- 
matically faster than the Liu & Tegmark (LT) method 
(51) . scaling as 0(N log N) instead of 0(N 3 ), where N 
is the number of voxels in the 3D sky map. By using the 
inverse variance weighted quadratic estimator formalism 
adapted to 21 cm tomography by the LT method, we 
preserve all accessible cosmological information in our 
measurement to produce the smallest possible error bars 
and narrow window functions. Moreover, our method 
can incorporate additional information about the bright- 
est point sources and thus further reduce our error bars 
at the cost of some — but by no means all — of that com- 
putational advantage. Our method is highly paralleliz- 
able and has only modest memory requirements; it never 
needs to store an entire N x N matrix. 

Our method achieves this computational speed-up for 
measuring power spectra, error bars, and window func- 
tions by eliminating the time-consuming matrix opera- 
tions of the LT method. We accomplish this using a com- 
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FIG. 18. We can see from a sampling of window functions that our band power spectrum estimates jp represent the weighted 
averages of p a over a narrow range of scales, especially at higher values of k± and fcii . The widest window functions can be 
attributed to binning (with linearly binned data, low-fc bins look larger on logarithmic axes) and to foregrounds. This is good 
news, because it will enable us to accurately make many independent measurements of the power spectrum and therefore better 
constrain astrophysical and cosmological parameters. 



bination of Fourier, spectral, and Monte Carlo techniques 
which exploit symmetries and other physical properties 
of our models for the noise and foregrounds. 

We have demonstrated the successful simulation of er- 
ror bars and window functions for the sort of massive data 
set we expect from the upcoming 128-tile deployment of 
the MWA — a data set that cannot be fully utilized us- 
ing only the LT method. Our forecast predicts that 1000 
hours of MWA observation should be enough to detect 
the fiducial 10 mK signal across much of the kn-k± plane 
accessible to the instrument. Moreover, we predict that 
the horizontal error bars on each band power estimate 
will be narrow, allowing each estimate to probe only a 
small range of scales. 

Our results suggest several avenues for further re- 
search. Of course, the most immediate application is to 
begin analyzing the data already being produced by in- 
terferometers like LOFAR, GMRT, MWA, and PAPER 
as they start accumulating the sensitivity necessary to 
zero in on a detection of the EoR. The large volume of 



data these instruments promise to produce might make it 
useful to explore ways of further speeding up the Monte 
Carlo estimation of the Fisher matrix. There is signifi- 
cant redundancy in our calculated Fisher matrix because 
the window function shapes vary only relatively slowly 
with fc-scale. We believe that one can reduce the number 
of Monte Carlo simulations needed to attain the same 
accuracy by adding a postprocessing step that fits the 
Fisher matrix to a parametrized form. This should work 
best in the regions of the fcii-fcj. plane that are fairly un- 
contaminated by foregrounds, where Fisher matrix ele- 
ments are expected to vary most smoothly. It may also 
be possible to speed up the Monte Carlo estimation of 
the Fisher matrix using the trace evaluation technology 
of [33]. 

The forecasting power of our method to see whether 
a particular observing campaign might reveal a partic- 
ular aspect the power spectrum need not be limited to 
measurements of the EoR. Our method provides an op- 
portunity to precisely predict what kind of measurement, 



2G 



and what kind of instrument, might be necessary for ob- 
serving 21 cm brightness temperature fluctuations during 
the cosmic dark ages. Our method should prove useful 
for weighing a number of important design considera- 
tions: What is the optimal array configuration? What is 
the optimal survey volume? What about angular resolu- 
tion? Spectral resolution? And in what sense are these 
choices optimal for doing astrophysics and cosmology? 

To help answer such questions, our technique could be 
used to compare the myriad of ideas for and possible im- 
plementations of future projects like HERA and the SKA 
and even to help find an optimal proposal. For example, 
one plan for achieving large collecting area is building 
a hierarchically regular array (a so-called "Omniscope") 
that takes advantage of FFT correlation |3T] and redun- 
dant baseline calibration [66] . There exist many array 
configurations that fit into this category and it is not 
obvious what the optimal Omniscope might look like. 

The quest to detect a statistical signal from the Epoch 
of Reionization is as daunting as it is exciting. It is 
no easy task to find that needle in a haystack of noise 
and foregrounds. However, now that we are for the first 
time armed with a method that can extract all the cos- 
mological information from a massive data set without 
a prohibitive computational cost, we can feel confident 
that a sufficiently sensitive experiment can make that 
first detection — not just in theory, but also in practice. 



ACKNOWLEDGMENTS 

The authors wish to thank Gianni Bernardi, Michael 
Betancourt, Katherine Deck, Jacqueline Hewitt, Andrew 
Lutomirski, Michael Matejek, Miguel Morales, Ue-Li 
Pen, John Rutherford, Leo Stein, Christopher Williams, 
and Phillip Zukin for many useful discussions and ad- 
vice. This work is partially supported by the Bruno Rossi 
Graduate Fellowship and NSF Grant AST-1105835. A.L. 
acknowledges support from the Berkeley Center for Cos- 
mological Physics. 



right in each successive row, then it is a special kind of 
Toeplitz matrix called a "circulant" matrix. Circulant 
matrices are diagonalized by the discrete Fourier trans- 
form |67j . Given a circulant matrix C with first column 
c, the product of C and some arbitrary vector v can be 
computed in O(NlogN) time because 



Cv = F f diag(Fc) Fv, 



(Al) 



where F is the unitary, discrete Fourier transform matrix 
67 . Reading Equation Al from right to left, we see that 



every matrix operation necessary for this multiplication 
can be performed in O(NlogN) time or better. 

Conveniently, any symmetric Toeplitz matrix can be 
embedded in a circulant matrix twice its size. Given 
a symmetric Toeplitz matrix T, we can define another 
symmetric Toeplitz matrix S with an arbitrary constant 
along its main diagonal. If we specify that the rest of 
the first row (besides the first entry) is the reverse of 
the rest of the first row of T (again ignoring the first 
entry), the fact that the matrix is Toeplitz and symmetric 
completely determines the other entries. For example, 

/ 2 3 \ 
then S = 2 2 . (A2) 
^ 3 2 j 

It is straightforward to verify that the matrix C, defined 
as 



if T = 




C = 




(A3) 



is a circulant matrix. We can now can multiply C by 
a zero-padded vector so as to yield the product of the 
Toeplitz matrix and the original vector, Tx, that we are 
looking for: 



(A4) 



Therefore, we can multiply any Toeplitz matrix by a vec- 
tor in 0(N log N). 




Appendix A: Toeplitz Matrices 

In this appendix, we briefly review how to rapidly mul- 
tiply by Toeplitz matrices. We need to employ the advan- 
tages of Toeplitz matrices because the assumption that 
our covariance matrices are diagonal in real space or in 
Fourier space, as was the case in [53], break down for 
covariance matrices with coherence lengths much larger 
than the box size. 

A "Toeplitz" matrix is any matrix with the same num- 
ber for every entry along its main diagonal and with ev- 
ery other diagonal similarly constant [67 . In general, a 
Toeplitz matrix is uniquely defined by the entries in its 
first row and its first column: if i > j then = Ti+j-^i 



and if i < j then T,j = T] 



i,i-i+j 



If the first row of a ma- 



Appendix B: Noise Covariance Matrix Derivation 

In this appendix, we derive the the form of N, the noise 
covariance matrix, in Equation [57] by combining the form 
of P Ar (k, A), the noise power spectrum, in Equation 



with Equation 33 
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_ which relates N to P N (k, A). To ac 
complish this, we simplify P N (k, A) into a form that is 
more directly connected to our data cube. We then ap- 
proximate the integrals in Equation [33] by assuming that 
the M^-coverage is piecewise constant in cells correspond- 
ing to our Fourier space gridpj 



trix is repeated with a cyclic permutation by one to the 



13 We also assume that measurements in nearby ti-u-cells are uncor- 
rected, which may not be true if the baselines are not coplanar; 
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To simplify P N (k,\), we first note that because the 
term B(k, A) in Equation 32 represents the synthesized 



beam and is normalized to peak at unity, we can rein- 
terpret the factor of (/ cover ) _1 i3 _2 (k, A) as an inverted 
and normalized uw-coverage. When f COVCT = i ; the ar- 
ray has uniform coverage. We want to replace the fac- 
tor (/ covcr )~ 1 _B _2 (k, A) with a quantity directly tied our 
choice of pixelization of the uu-plane and written in 
terms of the simplest observational specification: the to- 
tal time that baselines spend observing a particular uv- 
cell, t(k, A). We already know that the noise power is 
inversely proportional to that time because more time 
yields more independent samples. 

To relate ^(k, A) to (/ cover )- 1 S- 2 (k, A), we want to 
make sure that the formula yields the same answer for 
peak density in the case of a complete coverage. In other 



words, we want to find the constant t max such that 



i(k,A) 



cover \ — 1 / ) — 2 



B-(k,A). 



(Bl) 



The time spent in the most observed cell is related to the 
size of the cell in the icu-plane, the density of baselines 
in that cell, and the total integration time of the obser- 
vation, t. The cell size is determined by the pixelization 
of our data cube. We have divided each slice of our data 
cube of size L x x L y into n x x n y pixels. In Fourier space, 
this corresponds a pixel size of 



Afc T . 



2tt 



2tt 



d M A6 x n x ' 



(B2) 



where A6 X and is the angular pixelization in the x direc- 
tion. An equivalent relation is true for the y direction. 
Since Au = Ak x d>M I '(27r), we have that the area in uv- 
space of each of our grid points is 



AvAu 



1 



1 



1 



A8 x n x A6 y n y O p 



(B3) 



The maximum density of baselines is the density of the 
autocorrelations^] which is 



N„. 



(B4) 



where the quantity (A ant /X 2 ) is the area in the ww-plane 
associated with a single baseline [5]. We thus have that 



,AuAtiT = 



AUtA 2 r 

a nt ^ p ix ^ £ y 



(B5) 



instead N would have to be modeled as sparse rather than diag- 
onal in angular Fourier space. 
14 If the use of autocorrelations (which most observations throw 
out, due to their unfavorable noise properties) is troubling, then 
it is helpful to recall that for a large and fully-filled array, the 
Mii-density of the shortest baselines is approximately the same as 
the itv-density of the autocorrelations. 



Now we can substitute Equation |B1| into Equation 32 to 
get a more useful form of -P JV (k, A): 



^ A Ts ys yd 2 M 



1 



^ant^pix^^y *(k_L, A) ' 



(B6) 



In general, f(kj_,A) depends in a nontrivial way on 

the array layout; As such, the integral expression for 

N in Equation 33 with this form of P N (k, A) is only 



analytically tractable along the line of sight. Integrating 
k z , we get that 



Ni. 



^T 2 ys yd 2 M 



^ z ^ant^pix^a^y 

ik x (xi-xj)+ik v (yi - J/, ) _ 



jl{k x Ax/2)jl{k y Ay/2)x 
1 



i(kj.,A() (2tt) 2 



(B7) 



We note that N is uncorrelated between frequency chan- 
nels, as we would expect. 

Along the other two dimensions, we will approach the 
problem by approximating the integrand as piecewise 
constant in Fourier cells, turning the integral into a sum 
and the dk into a Afc. We will use the index I to run over 
all Fourier modes perpendicular to the line of sight. Us- 
ing the fact that the line of sight voxel length Az = yAv 
and that L x L y — ^l-p\ x d 2 M n x n y , we have that 



Ni, 



A 4 T 2 S z 



Al nt (^p^n x n y ) 2 Ai 



all x & y r 

E 



j 2 (k y ,iAy/2)e 



ikx,i{xi-Xj) pikyjiyi-y-j) 



j 2 (k x ^Ax/2)x 

1 



(B8) 



Next, we can turn this form into one that is more clearly 
computationally easy to multiply by a vector by intro- 
ducing another Kronecker delta: 



Na = 



Al nt (Q pix n x n y ) 2 Av 



all x & y 

E 

1 



e ikx,iXi e ik v ,iyi 



all x & y 

E 



j 2 (k x ^Ax/2)j 2 {k y ^Ay/2) 

Sim 



ti(Xi) 



(B9) 



Finally, if we extend I and m to index over all frequency 
channels and all Fourier modes perpendicular to the line 
of sight, we can write down the noise covariance matrix as 
N = F^NFi where Fj_ and are the discrete, unitary 
2D Fourier and inverse Fourier transforms and where N 
can be written as 



Ni, 



A 4 T S 2 i 2 (fe Xji Aar/2)i 2 (A ; ,, / Ay/2) 5 lr , 



Ai nt (fl piyL ) 2 n x n y Av 



U 



(BIO) 



The result, therefore, is a matrix that can be multiplied 
by a vector in 0(N\ogN). 
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Appendix C: Construction of the Preconditioner 

In this final appendix, we show how to construct the 
preconditioner that we use to speed up the conjugate 
gradient method for multiplying C _1 by a vector. We 
devise our preconditioner by looking at C piece by piece, 
building up pairs of matrices that make our covariances 
look more like the identity matrix. We start with C = N, 
generalize to C = U + N, and then finally incorporate R 
and G to build the full preconditioner. 



1. Constructing a Preconditioner for N 

Our first task is to find a pair of preconditioning ma- 
trices that turn N into the identity: 



PnNP 



I. 



(CI) 



Because N = F^NFj_, and because N is a diagonal ma- 
trix, we define Pn and P N as follows: 



p N = ]sr 1/2 Fj 
p f 



1/2 



(C2) 



Since applying Pn only requires multiplying by the in- 
verse square root of a diagonal matrix and Fourier trans- 
forming in two dimensions, the complexity of applying 
Pn to a vector is less than O(NlogN). 



2. Constructing a Preconditioner for U 

The matrix U (Equation [55| can be written as the ten- 
sor product of three Toeplitz matrices, one for each di- 
mension, bookended by two diagonal matrices, Du- Fur- 
thermore, since Du depends only on frequency (as we 
saw in Section 
such that 



HID 2), its effect can be folded into U 2 



DufU, (g) U y <E> U 2 ]Du = <E> U y <g> U' a . 



(C3) 



It is generally the case that XJ X and U y are both well ap- 
proximated by the identity matrix. This reflects the fact 
that the spatial clustering of unresolved point sources 
is comparable with the angular resolution of the instru- 
ment. This assumption turns out to be quite good for 
fairly compact arrays, since for an array with 1 km as 
its longest baseline — the sort of compact array thought 
to be optimal for 21 cm cosmology — we expect an an- 
gular resolution on the order of 10 arcminutes, which is 
comparable to the fiducial value of 7 arcminutes that LT 
took to describe the clustering length scale for unresolved 
point sources. That value appears to be fairly reasonable 
given the results of [551 ISS] ■ For the purposes of devising 
a preconditioner only, we can therefore adopt the simpli- 
fication that 



U 



(C4) 



where we have dropped the prime for notational simplic- 
ity. Looking back at Figure |1J this form of U neatly 
explains the stair-stepping behavior of the eigenvalues: 
for every eigenvalue of U z , U has n x x n y similar eigen- 
values. 

Since only a few eigenvalues of XJ Z are large, it is peda- 
gogically useful to first address a simplified version of the 
preconditioning problem where U z is approximated as a 
rank 1 matrix by cutting off its spectral decomposition 
after the first eigenvalue. We will later return to include 
the other relevant eigenvalues. We therefore write U as 
follows: 



U « I x ®I W ® Av a v z . 



(C5) 



where v z is the normalized eigenvector of U. 

Let us now take a look at the action of Pn and P N on 
U + N: 

P N (U + N)P N 

= I + N-^Fj.fe <8 I y <8> Av.v^F^N- 1 / 2 

= I + N- 1 / 2 ^ 81,® Av.v^N- 1 / 2 

= I + U. (C6) 

Our next goal, therefore, is to come up with a new matrix 
Pu that, when applied to I + U gives us something close 
to I. _ 

We now take a closer look at U. Since it is a good 
approximation to s ay that N only changes perpendicular 
to the line of sightl^jwe can rewrite U: 



U « (N ± 1/2 ® l z )(l x ®I V ® Av zV T)(N-^ <g> l z ) 



-1/2 



(NI 1 ) ® (Av.vt), 



(C7) 



where Nj_ is still a diagonal matrix, though only in two 
dimensions, generated from a baseline distribution aver- 
aged over frequency slices. We now form a pair of precon- 
ditioning matrices, Pu and Pjj of the form Pu = I — /3II 
where n has the property that IIU = U and that 
Unt = U. The matrix that fits this description is: 



v 2 vt) = -U, 



(C8) 



since N only affects the x and y components and thus 
passes through the inner matrix. This also means that 
n = rF and that II = II 2 . The result for Pu(I + U)P]j 
is 



(i-/m)(i + u)(i-/mt) = 

I + U - 2/3U - 2/3II + /3 2 U + p 2 u. 



(C9) 



15 Were it not for the breathing of the synthesized beam with fre- 
quency, N, would only change perpendicularly to the line of sight. 
Since it is a small effect when considered over a modest redshift 
range, we can ignore it in the construction of our preconditioner. 
After all, we only need to make PCPt close to I. 
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The trick now is that for each uv-ce\\, U has only one 
eigenvalue, which we call A/ (again using I as an index 
over both directions perpendicular to the line of sight): 



A; = 



A 



(N ± ) u 



(CIO) 



Likewise, II only has one eigenvalue: 1. By design, these 
eigenvalues correspond to the same eigenvector. Since 
our goal is to have the matrix in equation ( C9 1 be the 
identity, we need only pick /3 such that: 



Solving the quadratic equation, we get 



(Cll) 



Pu-I 



6 x,xi S y,\ 



(C12) 



where the pair of S matrices pick out a particular uv-cell. 
If we want to generalize to more eigenvectors of U z , we 
simply need to keep subtracting off sums of matrices on 
the right hand side of Equation (C12): 




(C13) 



This works because every set of vectors corresponding 
to a value of k is orthogonal to every other set. Each 
term in the above sum acts on a different subspace of C 
independent of all the other terms in the sum. 

If the relevant vectors v Zfc are precomputed, applying 
Pu can be done in 0(Nm(U z )) where m(U z ) is defined 
as the number of relevant eigenvalues of U z that need 
preconditioning or, equivalently, the number of "steps" 
in the eigenvalues of U in Figure [4] above the noise floor. 
We examine how m(U z ) scales with the size of the data 
cube in Section |IIIE2| Because the fall off of the eigen- 
values is exponential [70] we expect the scaling of m to 
be logarithmic. 

In general, we can pick some threshold 6 > 1 to com- 
pare to the largest value of A/,fc for a given k and then 
do not precondition modes with eigenvalues smaller than 
9. One might expect there to be diminishing marginal 
utility to preconditioning the eigenvalues nearest 1. We 
explore how to optimally cut off the spectral decomposi- 
tion in Section [ill E 3| by searching for a value of where 
the costs and benefits of preconditioning equalize. 



and G perpendicular to the line of sight are diagonal- 
ized in completely different bases. However, U, G, and 
R have very similar components parallel to the line of 
sight, due to the fact they all represent spectrally smooth 
radiation of astrophysical origin. 
We can write down R as follows: 



R = E 



S 



!J-Vn 




(C14) 



which can be interpreted as a set of matrices describing 
spectral coherence, each localized to one point source, 
and all of which are spatially uncorrelated. And likewise, 
we can write down G as: 



A 



(C15) 



We now make two key approximations for the purposes of 
preconditioning. First, we assume that all the Zk eigen- 
vectors are the same, so v Zk as v Zn k for all n, all of 
which are also taken to be the same as the eigenvectors 
that appear in the preconditioner for U in Equation |C13| 
Second, as in Section |C 2[ we are only interested in act- 
ing upon the largest eigenvalues of R and G. To this 
end, we will ultimately only consider the largest values 
of A„.fe and \.j,k = A Xi X Vj \ Zk , which will vastly reduce 
the computational complexity of the preconditioner. 

Our strategy for overcoming the difficulty of the differ- 
ent bases is to simply add the two perpendicular parts of 
the matrices and then decompose the sum into its eigen- 
values and eigenvectors. We therefore define 



R+G 



(C16) 



(choosing the symbol T because it looks like R and 
sounds like G). Given the above approximations, we can 
reexpress T as follows: 



E(r±,*®v, fc vtJ, 



(C17) 



where we have defined each T± j, as: 



\ n 



' v,Vr, 



(C18) 



3. Constructing a Preconditioner for R and G 

We now turn our attention to the full matrix C. The 
fundamental challenge to preconditioning all of the ma- 
trices in C simultaneously is that the components of R 



Due to the high spectral coherence of the foregrounds, 
only a few values of k need to be included to precondition 
for r. Considering the limit on angular box size imposed 
by the flat sky approximation and the limit on angular 
resolution imposed by the array size, this should require 
at most a few eigenvalue determinations of matrices no 
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bigger than about 10 4 entries on a side. Moreover, those 
eigenvalue decompositions need only be computed once 
and then only partially stored for future use. In practice, 
this is not a rate-limiting step, as we see in Section [lIIE 2| 
We now write down the eigenvalue decomposition of 



of P r and P N on N + U 



r. Noting that the sole 
' 1 iv ■"■ ■ wc also define 



eigenvalue of r is Ar = ArVj_N_|_"v_i_ 
Au = Auv^N]] 1 ^^. Multiplying our preconditioner by 
our matrices, we see that the the equality of the single 
eigenvalues yields another quadratic equation for /3: 



k 



(j2 Aj.fcVi.jvJ^ v 2fc vl fc . 



(C19) 



Before we attack the general case, we assume that only 
one value of A/ \. is worth preconditioning — we generalize 
to the full Pr later. We now know that if we have a 
matrix that looks like I + U we can make it look like I. 
So can we take I + U + T, where T = PnTP^, and turn 
it into I + U? Looking at T, 



T =N- 1/2 F_ L A r vv t F t l NT 1/2 



=A r (N- 1/2 v ± vlN- i/z )®v 2 vT, (C20) 

where Ar is the sole eigenvalue we are considering and 
where Vj_ = Fj_vj_. 

Again, we will look at a preconditioner of the Pr = 
I — j3H where: 



-A 



-1/2, 



n 



C" T - 1/2 — ~t Ct1/2 

N | ' vj_v I N ' 



A 



(C21) 



' i 1 /2 

This time, the N , matrices do not pass through the 
eigenvectors to cancel one another out. We now exploit 
the spectral similarity of foregrounds and the fact that 



V V_l 



1 to obtain 



P r UP r = U+^(/3 2 
A r 



2/3)r. 



(C22) 



This is very useful because it means that if we pick 
j3 properly, we can get the second term to cancel the 
r terms we expect when we calculate the full effect 



1 + A 



u 



1 



+ A 



2/3 + /? 
Au 



At 



h(/3 2 -2/3 + l)A r 
(/3 2 -2(3). (C23) 



Solving, we finally have our Pr that acts on I 
and yields I + U: 



U 



1 + A 



u 



1 + Au + A 



'tT t -1/2~ ~f Crl/2 



(C24) 



Finally, generalizing to multiple eigenvalues and taking 
advantage of the orthonormality of the eigenvectors, we 
have 



k,rn 



1 + A 



u. 



1 + A Ufc + Ar fe 



(( 



N 



-1/2 



V -L m v i, 



N 



1/2 



(C25) 



The result of this somewhat complicated preconditioner 
is a reduction of the condition number of the matrix to 
be inverted by many orders of magnitude (see Figure |5| . 

Lastly, we include Fourier transforms at the front and 
the back of the preconditioner, so that the result, when 
multiplied by a real vector, returns a real vector. There- 
fore, the total preconditioner we use for C is: 

Fi_P u P r P N (R + U + N + G)Pj SJ P r P u F ± . (C26) 



[1] R. Barkana and A. Loe b, |Phys. Rep. 349, 125 ( 2001) 

|arXiv:astro-ph/0010468l 
[2] S. R. Furlanetto S."P~Oh, and F. H. Brig gs, |Phys. Rep.| 

433, 181 (2006)| |arXiv:astro-ph/0608032[ 
|3| M. F. Morales and J. S. B. Wyithe, | ARAfcA 48, 127 

(2010)HarXiv:0910.3010 [astro-ph CO] 
[4j J. ft~Pritchard and A. Loe b, Repo rts on P rogress 



in Physics 75, 086901 (2012)[ |arXiv:1 109.6012 [astro- 



ph.COjl 

[5] M. McQuinn, O. Zahn, M. Zaldarriaga, L. Hernquist, 
and S. R. Furlanetto, ApJ 653, 815 (2006J |arXiv:astr6^| 



ph/0512263| 

[6j M. G. Santos and A. Cooray, |Phys. Rev. D 74, 083517| 

(2006)HarXiv:astro-ph/0605677| 
|7| J. D. Bowman M. F. Morales, and J . N. Hewitt, |ApJ| 



661, 1 (2007)HarXiv:astro-ph/0512262 
[8] S. R. Furlanetto, A. Lidz, A. Loeb, M. McQuinn, J. R 



Pritchard, P. R. Shapiro, M. A. Alvarez, D. C. Backer, 
J. D. Bowman, J. O. Burns, C. L. Carilli, R. Cen, 
A. Cooray, N. Gnedin, L. J. Greenhill, Z. Haiman, J. N. 
Hewitt, and et al., in astro2010: The Astronomy and As- 
trophysics Decadal Survey, Astronomy, Vol. 2010 (2009) 
pp. 82-+. 



[9] J. S. B. Wy ithe, A. Loeb, and P. M. Geil, |MNRA S 383, 
1195 (2008)| 

[10] T. ChangTU. Pen, J. B. Peterson, and P. McDonal d, 
|Phys. Rev. L ett. 100, 091303 (2 008) arXiv:0709.3672]_ 
[11] P. Madau, A. Meiksin, and M J. Rees, |ApJ 475, 429 



(1997)[ |arXiv:astro -ph/9608010| 
[12] P. Tozzi, P. Madau, A. Meiksin, and M . J. Rees, |ApJ| 



528, 597 (2000), a rXiv:astro-ph/9903139| 
[13] P. Tozzi, P. Madau, A. Meiksin, and M. J. Rees, Nuclear 
Physics B Procee dings Supplements 80, C509+ (2000), 
|arXiv:astro-ph/9905199| 



31 



[16 



17 



[18 



[19 

[20; 

[21 
[22 



[23 



[24 



125 



126 



I. T. Iliev, P. R . Shapir o, A. Ferrara, and H. Martel 
| ApJL 572, L123 (2002)||arXiv:astro-ph/02024io] 



R. Furlanetto, A. Sokasian, and L. Hernquist, MNRAS 
347, 187 (2004)| |arXiv:astro-ph/0305065| 



A. Loeb and M. Zaldarriaga, Physical Review Letters 92, 
211301 (2004)] |arXiv:astrc-ph/0312134| 



S. R. Furlanetto, M. Zaldarriaga, and L. Hernquist, ApJ 
613, 16 (2004), ar Xiv:astro- ph/0404112l 



R. Barkana and A. Loeb, ApJ 626, 1 (2005) arXiv:astro- 
ph/0410129l 

K. J. Mack and D. H. Wesley, ArXiv e-prints (2008), 
larXiv:0805.f53Tl 

Y. Mao, M. Tegm ark, M. McQuinn, M. Zaldar- 
riaga, and O. Z ahn, |Phys. Rev. D 78, 023529 (2008)| 
larXiv:0802.T7T0l 

S. Clesse, L. Lopez-Honorez, C. Ringeval, H. Tashiro, 
and M. H. G. Tytgat, |Phys. Rev. D 86, 123506 (2012)| 
|arXiv:1208.4277 [astro-ph.COjl 

A. de Oliveira-Costa, M. Tegmark, B. M. Gaensler, 
J. Jonas, T L. Landecker, a nd P. Reich, MNRAS 388, 
247 (2008)| |arXiv:0802.1525l 

G. Bernardi, D. A. Mitchell, S. M. Ord, L. J. Greenhill, 

B. Pindor, R. B Wayth, and J. S. B. Wyithe, |MNRAS| 
413, 411 (20TT)||arXiv:1012.3719 [astro-ph.CO]T~ 
A. Liu and M. Tegmark, Physical Review D 83, 103006 



(2011)| ^rXiv:1103. 0281 [astro-ph.CO] | 
M. A. Garrett, ArXiv e-prints (2009), arXiv:0909.3147 

: .n l 



[astro-ph.IM]] 

G. Paciga, T.-C. Chang, Y. Gupta, R. Nityanada, J. Ode- 
gova, U.-L. Pen, J. B. Peterson, J. Roy, and K. Sigurd- 
son, IMNRAS 413, 1174 (2011)| |arXiv:1006 1351 [astro- 



ph.COjl 

[27J S. J. Tingay, R. Goeke, J. D. Bowman, D. Emrich, S. M. 
Ord, D. A. Mitchell, M. F. Morales, T. Booler, B. Crosse, 
D. Pallot, A. Wicenec, W. Arcus, D. Barnes, G. Bernardi, 
F. Briggs, S. Burns, J. D. Bunton, R. J. Cappallo, 
T. Colegate, B. E. Corey, A. Deshpande, L. deSouza, 
B. M. Gaensler, L. J. Greenhill, J. Hall, B. J. Hazelton, 

D. Heme, J. N. Hewitt, M. Johnston-Hollitt, D. L. Ka- 
plan, J. C. Kasper, B. B. Kincaid, R. Koenig, E. Kratzen- 
berg, C. J. Lonsdale, M. J. Lynch, B. McKinley, S. R. 
McWhirter, E. Morgan, D. Oberoi, J. Pathikulangara, 
T. Prabu, R. A. Remillard, A. E. E. Rogers, A. Roshi, 
J. E. Salah, R. J. Sault, N. Udaya-Shankar, F. Schla- 
genhaufer, K. S. Srivani, J. Stevens, R. Subrahmanyan, 
S. Tremblay, R. B. Wayth, M. Waterson, R. L. Webster, 
A. R. Whitney, A. Williams, C. L Williams, and J. S. B 
Wyithe , ArXiv e-prints (2012), [arXiv: 1206.6945 [astro-| 
ph.IM] 

[28] A. R. Parsons, D. C. Backer, G. S. Foster, M. C. H. 
Wright, R. F. Bradley, N. E. Gugliucci, C. R. Parashare, 

E. E. Benoit, J. E. Aguirre, D. C. Jacobs, C. L. Car- 
illi, D. Heme, M. J. Lynch, J. R. Manley, and D. J. 



Werthimer, AJ 139, 1468 (2010) arXiv:0904.2334 [astro 



ph.CO] 

[29J C. L. Williams, J. N. Hewitt, A. M. Levine, A. de 
Oliveira-Costa, J. D. Bowman, F. H. Briggs, B. M. 
Gaensler, L. L. Hernquist, D. A. Mitchell, M. F. Morales, 
S. K. Sethi, R. Subrahmanyan, E. M. Sadler, W. Arcus, 
D. G. Barnes, G. Bernardi, J. D. Bunton, R. C. Cappallo, 
B. W. Crosse, B. E. Corey, A. Deshpande, L. deSouza, 
D. Emrich, R. F. Goeke, L. J. Greenhill, B. J. Hazel- 
ton, D. Heme, D. L. Kaplan, J. C. Kasper, B. B. Kin- 
caid, R. Koenig, E. Kratzenberg, C. J. Lonsdale, M. J. 



Lynch, S. R. McWhirter, E. H. Morgan, D. Oberoi, 
S. M. Ord, J. Pathikulangara, T. Prabu, R. A. Remil- 
lard, A. E. E. Rogers, D. Anish Roshi, J. E. Salah, R. J. 
Sault, N. Udaya Shankar, K. S. Srivani, J. B. Stevens, 
S. J. Tingay, R. B. Wayth, M. Waterson, R. L. Webster, 
A. R. Whitney, A. J. Williams, and J. S. B. Wy ithe, 
|ApJ 755, 47 (2012)1 larXiv: 1203.5790 [astro-ph.CO]| 

[30] L. J. Greenhill and G. Bernard i, ArXiv e-prints (2012), 
|arXiv:1201.1700 [astro-ph.CO]| 

[31] M. Tegmark and M. Zaldarriaga, [Rhys. Re v. D 82, | 



103501 (2 010), arXiv:0909.0001 [astro-ph. CO]T 
[32] C. L. Carilli arid S. Rawl ings, |New A Rev. 48, 979 (2004)| 

|arXiv:astro-ph/0409274| 

[33] U.-L. Pen, |MNRAS 346, 619 (2003) 
ph/0304513 



arXiv:astro- 



[34] P. M. S utter, B. D. Wandelt, and S. S. Malu, ||ApJS 202^ 

9 (2012)[|arXiv:1109.4640 [astro-ph.CO] | 
[35] X. Wang, M. Tegmark, M. G. Santos, and L. Knox, [ApJ 
650, 529 (2006)| |arXiv:astro-ph/0501081 



[36] L. Gleser, A. Nusser, and A . J. Benson, |MNRAS 391, 



383 (2008), arXiv:0712.0497 
[37] J. D. Bowman, M. F. Morales, and J. N. Hewitt, |ApJ| 



695, 183 (2009), arXiv:0807.3956 
38] A. Liu, M. Tegmark, and M Zaldarriaga, |MNRAS 394] 



1575 (2009) , arXiv:0807.3952j 
[39] V. Jelic, S. Zaroubi, P. Labropoulos, R. M. Thomas, 
G. Bernardi, M. A. Brentjens, A. G. de Bruyn, B. Cia- 
rdi, G. Harker, L. V. E. Koopmans, V. N. Pandey, 
J. Schaye, and S. Yatawatta, |MNRAS 389, 1319 (2008)] 
larXiv:0804.TT30l 
[40] G. Harker, S. Zaroubi, G. Bernardi, M. A. Brentjens, 
A. G. de Bruyn, B. Ciardi, V. Jelic, L. V. E. Koopmans, 
P. Labropoulos, G. Mellema, A. Offringa, V. N. Pandey, 
J. Schaye, R. M. Thomas, and S. Yatawatta, |MNRAS| 



397, 1138 (2009), arXiv:0903.2760 [astro-ph.CO] 
|41| A. Liu, M. Tegmark, J . Bowman, J. He witt, and M. Z al- 

darriaga, MNRAS 398, 40 1 (2009)| |arXiv:0903.4890"[ 
[42] G. Harker, S. Zaroubi, G. Bernardi, M. A. Brent- 
jens, A. G. de Bruyn, B. Ciardi, V. Jelic, L. V. E. 
Koopmans, P. Labropoulos, G. Mellema, A. Offringa, 
V. N. Pandey, A. H. P awlik, J. Schaye, R. M 
Thomas, and S. Yatawatta, [MNRAS 405, 2492 (2010")] 
|arXiv:1003.0965 [astro-ph.CO] \~ 



[43] J. Cho , A. Lazarian, and P. T . Timbie, A pJ 749, 164 
I (2012)| |arXiv:1203. 5197 [astro-ph.CO] | 
[44] M. McQuinn, L. Hernquist, M. Zaldarriaga, and 

S. Dutta, pViNRAS 381, 75 (2007)||arXiv:0704. 2239 
[45] M. Tegmark, A. J. S. Hamilton, M. A. St rauss, M. S. 

Vogeley, and A. S. Szalay, ApJ (1998), arXiv:astro- 

ph/970802"0| 

[46] R. Barkana and A. Loe b, |ApJ Lett. 624, L65 (2005) 
|arXiv:astro-ph/0409572] 



[47] S. S. Ali, S . Bharad waj, and B. Pa ndey, MNRAS 363, 
251 (2005)] |arXiv:astro-ph/0503237[ 



|48| A. Nusser, |MNRAS 364, 743 (2005)| |arXiv:astro- 



ph/0410420| 

'[49] R. Barkana , |MNRAS 372, 259 (2006)] |arXiv:astro- 
ph/0508341 



[50] M. Tegmark , |Phys. Rev. D 55, 5895 (1997)||arXiv:astro- 
ph/961l"l74[ 



[51] J. R. Bond, A. H. Jaffe, and L. Knox , |Phys. Rev. PWf, 
I 2117 (1998)| |arXiv:astro-ph/9708203| 
[52] R. A. Fisher, J. Roy. Stat Soc. 98, 39 (1935). 
[53] M. Tegmark, A. J. S. Hamilton, and Y. Xu, MNRAS 



32 



335, 887 (2002), arXiv:astr o-ph/0111575| 
541 T. Di Matteo, R. Perna, T. Abel, and M. J. Rees. 



ApJ 



Rev. D 79 



564, 576 (2002), arXiv:astro-ph/0109241 
[55] M. Tegmark and M. Zaldarriaga, Phys 

083530 (2009)| |arXiv:0805.4414| 
[56j D. W. Hogg, ArXiv Astrophysics e-prints (1999), 

|arXiv:astro-pn/9905116l 
[57] S. P. Oh, D. N. SpergelT and G. Hinshaw, |ApJ 510, 551| 

(1999)| [arXiv:astro-ph/9805339| 
[58] N. Jarosik, C. Barnes, M. R. Greason, R. S. Hill, M. R. 
Nolta, N. Odegard, J. L. Weiland, R. Bean, C. L. Ben- 
nett, O. Dore, M. Halpern, G. Hinshaw, A. Kogut, E. Ko- 
matsu, M. Limon, S. S. Meyer, L. Page, D. N. Spergel, 
G. S. Tucker, E. Wollack, and E. L. Wright, ApJS 170, 



[59] 



263 (2007), arXiv:astro-ph/0603452 



[60] 
[61] 



M. R. Hestenes and E. Stiefel, Journal of Research of the 
National Bur eau of Standards 49, 409 (1952). 
J. Shewchuk, unpublished, available online (1994) 
A. Datta, J. D. Bowman, and C. L. Cari lli, |A"pJ 724, 
526 (2010)] |arXiv: 1005.4071 [astro-ph.CO]| 
[62] H. Vedantham, N. Udaya Shankar, and R. Subrah- 
manyan , |ApJ 745, 176 (2012)| |arXiv:1106.1297 [astro-| 
ph.IM]| 



[67] 
[68] 



[63] M. F. Morales, B. Hazelton, I. Sullivan, and A. B eards 
ley, |ApJ 752, 137 (2012)[ |arXiv: 1202. 3830 [astro-ph.iM 
[64] C. M. Trott, R. B. Wayth, and S. J. Ting ay, |ApJ 757 
101 (20l2)]|arXiv: 1208.0646 [astro-ph.CO]| 
T. A. Wood and G. Chan, Journal of Computational and 
Graphical Statistics 3, 409 (1994). 

A. Liu, M. Tegmark, S. Morrison, A. Lutomirski, 
and M. Zaldarriaga, |MNRAS 408, 1029 (2010)"] 



[65] 



arXiv:1001.5268 [astro-ph.IM] 
R. M. Gray, Foundations and Trends in Communications 
and Information Theory 2, 155 (2006). 
G. Bernardi, A. G. de Bruyn, M. A. Brentjens, B. Ciardi, 
G. Harker, V. Jelic, L. V. E. Koopmans, P. Labropoulos, 
A. Offringa, V. N. Pandey, J. Schaye, R. M. Thomas, 
S. Yatawatta, and S. Zaroub i, |AfcA 500, 965~ (2009) 



N. 



arXiv:0904.04 04 [astro-ph.CO]| 

A. Ghosh, J. Prasad, S. Bharadwaj, S. S. Ali, and J. 



Chengalur, MNRAS 426, 3295 (2012) arXiv: 1208. 1617 



| [astro-ph.COj| 
[70] A. Liu and M. Tegmark, |MNRAS 419, 3491 (2012)| 
|arXiv:1106.0007 [astro-ph.COj| " 



